Towards Inferring Network Properties from Epidemic Data

Epidemic propagation on networks represents an important departure from traditional mass-action models. However, the high-dimensionality of the exact models poses a challenge to both mathematical analysis and parameter inference. By using mean-field models, such as the pairwise model (PWM), the high-dimensionality becomes tractable. While such models have been used extensively for model analysis, there is limited work in the context of statistical inference. In this paper, we explore the extent to which the PWM with the susceptible-infected-recovered (SIR) epidemic can be used to infer disease- and network-related parameters. Data from an epidemics can be loosely categorised as being population level, e.g., daily new cases, or individual level, e.g., recovery times. To understand if and how network inference is influenced by the type of data, we employed the widely-used MLE approach for population-level data and dynamical survival analysis (DSA) for individual-level data. For scenarios in which there is no model mismatch, such as when data are generated via simulations, both methods perform well despite strong dependence between parameters. In contrast, for real-world data, such as foot-and-mouth, H1N1 and COVID19, whereas the DSA method appears fairly robust to potential model mismatch and produces parameter estimates that are epidemiologically plausible, our results with the MLE method revealed several issues pertaining to parameter unidentifiability and a lack of robustness to exact knowledge about key quantities such as population size and/or proportion of under reporting. Taken together, however, our findings suggest that network-based mean-field models can be used to formulate approximate likelihoods which, coupled with an efficient inference scheme, make it possible to not only learn about the parameters of the disease dynamics but also that of the underlying network.


Introduction
Exact mathematical models for describing the spread of epidemics on networks are often insoluble or intractable for large networks [16,13].'Mean-field' models provide a solution by introducing approximations and focusing on quantities at the population level, such as the expectation of the number of infected or susceptible individuals, or the number of direct connections between two such groups [15].Many mean-field models exist to describe the dynamics of epidemic processes on networks.They usually take the form of a system of ODEs describing these processes [6].Such models typically involve applying a 'closure' to exact models.Closures rely on assumptions about the underlying contact network and/or even the dynamics (usually simplifying ones), and these assumptions bring the complexity of a given system to manageable levels [20].
Modelling epidemics on networks using mean-field approximations is a well studied and active area of research [17,1].In both theoretical and applied settings, it is used for parameter estimation, prediction and informing intervention or policy making [2], as recently demonstrated during the COVID-19 global pandemic [18].However, there is a lack of understanding as to how such models The starting point is the modelling of population contact structures as a network of nodes connected by links which represent possible routes of disease transmission.The network can be represented by an adjacency matrix G = (g ij ) i,j=1,2,...,N , where N is the number of nodes and the entries are either zero or one and the matrix is symmetric and all elements on the main diagonal are zero, i.e., no self-loops are allowed.In this paper, we will focus on regular or homogeneous networks where each node has exactly n links.
When modelled as a continuous-time Markov Chain, a stochastic susceptible-infected-recovered (SIR) epidemic on a network results in a state space of size 3 N since each of the N nodes can be independently S, I or R, and each state, that is, a labelled network, needs an equation [6].This of course makes the model intractable both theoretically and numerically, even at modest values of N .Of course, Gillespie [5] simulations can help deal with the problem and enable us to produce true stochastic paths of the process, see Figure 1 for example.This is based on the simple principle that in the Markovian framework, infection and recovery are independent Poisson process with rate τ and γ. τ is the per-link rate of infection and is the rate at which the I (infected) node in an I-S link infects the S (susceptible) node.This process is network-dependent.All infected nodes recover independently of the network and of each other at rate γ.
One way to move beyond simulations while dealing with the challenges of intractable high-dimensional models is to use mean-field models that focus on some expected quantity from the exact system, such as the expected number of infected nodes or the expected number of pairs of various types (e.g., S-S and S-I).One widely used model is the pairwise model [6] which is briefly described below.

Pairwise model as an approximation of epidemics on networks
In essence, the pairwise model focuses on a hierarchical construction where the expected number of nodes in state A at time t, [A](t), depends on the expected number of pairs of various types (e.g., [AB]) and these, in turn, depend on triples such as [ABC].Here, the counting is done in all possible directions, meaning that [SS] pairs are counted twice and [SI] = [IS].With this in mind, the pairwise model becomes [ ṠI] = −(τ This system is not self-consistent as pairs depend on triples and equations for these are needed.This, however, would lead to an explosion in system size as triples will then depend on quadruples connected in ways different from the simple line graphs over four nodes.To tackle this dependency on higher-order moments, the triples in equation ( 2) are closed using the following relation, where A, B ∈ {A, B}.Applying this closure leads to [ Ṙ] = γ[I], [ ṠI] = −(τ which is now a self-contained system.For a chosen set of parameters (n, τ, γ) and initial conditions, the system above can be numerically integrated, furnishing us with [I](t) for example.As it turns out, see Figure (1), this low-dimensional mean-field model is exact in the asymptotic limit of N → ∞, and the numerical solution of the PW model is indistinguishable from the average of stochastic realisations.We note that there are necessary and sufficient conditions which guarantee that the PW model is exact in the limit of large network sizes.In particular, it is true for networks with Binomial (with Regular being a special case of Binomial), Poisson and Negative Binomial degree distributions [12,10].Using that R 0 = τ (n−1) τ +γ , the closed pairwise equations can be reparameterised to include R 0 explicitly.Using ξ to denote ξ = n−1 n , the re-parameterised system now reads [ Ṙ] = +γ[I], (11) [ ṠI] = − γR 0 [ ṠS] = −2ξ

Data
Typically, real-world data for inference comes as daily counts of some quantity of interest (e.g., daily new cases or daily deaths) at discrete time steps, that is where (y 1 , ..., y n ) ∈ {0, ...N } and (t 1 , ..., the counts and times respectively.In this paper, we will consider three types of data, which are described below.

Data: PWM output with noise
Since the mean-field model is an approximation of the true stochastic process, we start by simulating data directly from the mean-field model and with varying levels of noise dispersion added in order to assess the ability of the inference schemes to recover the expected parameters, i.e., those used to generate the data (before noise).Since we mainly fit to daily reported cases, we first solve the PW model numerically with a given set of parameters and compute the daily new cases on day i, ([S](i + 1) − [S](i)).Observations begin on the first day, at the earliest, and the initial conditions of the PWM are set at t = 0. Noise is introduced using draws from the Negative Binomial distribution.This is done such that the mean of the distribution is given by the model and the variance is controlled by the experimenter.For the Negative Binomial, and given a daily new cases count, y d , from the true model without noise, we draw a sample from where the mean of this distribution is y d , the variance is given by y d + y 2 d k with k the dispersion parameter, and the negative binomial distribution is interpreted as giving the probability of observing y D failures given m successes, that is

Data: stochastic simulations
Since the real challenge is to fit to stochastic data, in the first instance, we consider simulated data constructed by using the Gillespie algorithm [5] for a stochastic SIR epidemic on an explicit network of contacts.The idea behind the simulation is rather simple.Each node has its own rate, resulting in a rate vector (r i ) i=1,2,...,N .A susceptible node with m infected neighbours will have rate τ m and an infected node will have rate γ.Recovered or removed nodes have rate zero as they no longer play a role in the dynamics.The time to next event is chosen from an exponential with rate R = i r i , and the event itself will be chosen at random from all possible N -events but proportionally to the values of the rate, e.g., event j will be chosen with probability r j /R.Typical simulation plots are shown in Figures (1).

Data: real epidemic data
In addition to assessing the robustness of the inference schemes on synthetic data for which ground truth is known, we considered real-world outbreak data from three different data sets: 1.The 2001 Foot-and-mouth (FMD) disease outbreak in the UK.The 2001 FMD outbreak in the UK started towards the end of February in 2001 and ended in September 2001, impacting more than 2000 farms.Control efforts resulted in the culling of millions of livestock [3], see Figure 15.
2. The A(H1N1) outbreak in Washington State University (WSU) campus at Pullman, Washington.In April 2009, there was an outbreak of influenza virus in Veracruz, Mexico.After this initial outbreak, a new strain of the virus, A(H1N1)pdm09, started to spread around the world in the autumn.See [19,9] for more details about this triple reassortment virus, which spread even among young, healthy adults.As a result, multiple outbreaks on college campuses were seen, one of which was on the Washington State University (WSU) campus in Pullman, Washington in late August 2009.Within the space of three months, almost 2300 students came to the campus health centre with influenza-like illnesses that were treated as influenza A(H1N1) infections.Figure 15 shows the daily new cases starting on 22 August 2009.
3. The third wave of COVID-19 in India The COVID-19 pandemic has killed millions of people across the globe.Here, we consider the third wave in India.Similar to the other two datasets, we have daily incidence and prevalence of cases, recoveries and deaths from 15 February 2021 to 31 June 2021 (see Figure 15).

Inference methods
While most inference methods are based on the optimisation of a likelihood function, the likelihood function itself can be formulated based on different considerations of the underlying model and data.The most direct method typically focuses on matching model output and data as closely as possible, i.e., it is an error minimisation process.More sophisticated methods consider the underlying stochastic model in a more direct way and involve the timing of events, even if simplifying assumptions may be needed.To ensure that investigation into the possibility of inferring epidemic and network parameters using the pairwise model is not affected or biased by the inference scheme used, we consider two different methods as described below.

Maximum-likelihood-based approach
In order to fit data produced by the PW model with the likelihood based on the PW model, we simply test how well the true parameters can be recovered.This scenario does not require any approximation.When fitting to stochastic data from an exact epidemic or a real epidemic, however, we are making the assumption that the exact forward model can be approximated by the PW model.
In this paper, we use the negative-binomial distribution as likelihood of choice, because of its flexibility.The distribution models the number of failures given a target number of successes, n, and the probability of each experiment's success p.For these parameters, we have the expressions: with k > 0 being the dispersion parameter, which we also attempt to infer.In this case, the distribution has mean y θ (t i ) and variance y θ (t i ) + y θ (t i ) 2 k.This yields the following likelihood Using L N B effectively decouples the mean and the variance of the distribution describing the data.This is expected to be sufficient to capture the variability of the data resulting from either natural stochasticity or variability due to how data was collected.
Parameter estimation was performed by minimising the negative log-likelihood using the widely used direct search Nelder-Mead method.Because this technique can converge to non-stationary points, for each estimation process, multiple initial conditions (15) were used.To avoid biasing the search, initial conditions were drawn using Latin hypercube sampling, maximising the minimum distance between points.Because Latin hypercube sampling cannot prevent inappropriate parameter settings, initial conditions were only accepted if the ratio τ /γ was not too large.Specifically, we enforced that the denominator in the expression of τ , i.e., n − 1 − R 0 , was greater or equal than 1.5 (chosen empirically).On average, 10 out of 15 initial conditions survived.Code and data used to produce the results in Sections 5 are available from https://github.com/berthouz/EpiPWMInffor fitting the ODE realisations with negative binomial noise, https://github.com/berthouz/EpiPWMInfwI0for fitting the Gillespie stochastic realisations and https://github.com/berthouz/EpiPWMInfwI0Nfor fitting the real-world datasets.
The key difference between these will be explained in the relevant results sections.

Dynamical Survival Analysis
The statistical methodology Dynamical survival analysis (DSA) has recently been developed in a series of papers [9,4,22,8] to address some of the shortcomings of traditional inference methods used in infectious diseases epidemiology.In essence, the method combines classical dynamical systems theory with tools from survival analysis.The crux of the methodology lies in interpreting the mean-field ODEs (representing population proportions) as describing probability distributions of transfer times, such as time to infection, time to recovery.Such a change in perspective allows one to use population-level mean-field ODEs to describe the dynamics of scaled compartment sizes as well as to write a likelihood function for individual-level trajectories based on transfer times, which may be censored, truncated or even aggregated.
To apply the DSA methodology, let us first define with .
Given a random sample of infection times t 1 , t 2 , . . ., t n , the likelihood contribution of the infection times is given by Note that DSA does not require knowledge of removal times.However, if individual recovery or removal times are known, they may be used to enhance the quality of inference.The likelihood contribution of a random sample of individual recovery times t 1 , t 2 , . . ., t m is given by where is the density of the individual recovery times.The density r T is a convolution of two densities: h T for the time of infection and the density of an exponential distribution with rate γ corresponding to the infectious period.In practice, it is convenient to differentiate the density r T (t) with respect to t and then solve a system of ODEs.
Finally, the DSA likelihood function based on a random sample of infection times t 1 , t 2 , . . ., t n and a random sample of recovery times t 1 , t 2 , . . ., t m is given by For practical convenience (and as with the MLE-based approach), we work with the loglikelihood function, i.e., the logarithm of the likelihood function, rather than the likelihood function.It is, of course, possible to maximise the DSA likelihood function in equation ( 21) to get point estimates of the parameter set (ξ, τ, γ, ρ).Such a procedure would then be called a maximum likelihood approach and the difference between the two inference schemes discussed here would simply be that they maximise two different likelihood functions.An alternative way to perform parameter inference using DSA is to adopt a semi-Bayesian approach via a Laplace approximation to the posterior.In this paper, we adopted a fully Bayesian approach.Specifically, we drew posterior samples of (ξ, τ, γ, ρ) using a Hamiltonian Monte Carlo (HMC) scheme implemented in the Stan programming language [21] interfaced with R. The code will be made available upon request.
Some of the datasets used in this paper (see relevant sections) provides daily new infection cases, rather than infection and/or recovery times.As mentioned earlier, the DSA methodology does not require knowledge of removal times.When these are not available, one can simply work with the likelihood function I (or the corresponding loglikelihood) in equation (19).Infection times, in turn, can be constructed from daily new cases as follows: If we observe 10 new cases on day t, then we simply draw 10 random samples from a uniform distribution over [t − 0.5, t + 0.5].By repeating this procedure for all days for which daily new case counts are available and combining the individual infection times (samples from the uniform distributions), we can transform the original count data into data on infection times.A random sample of those infection times can then be fed into the likelihood function I in equation (19).In datasets in which daily recoveries are available, we can construct individual recovery times in a similar fashion: If we observe 5 recoveries on day t, we draw a random sample of size 5 from a uniform distribution over [t − 0.5, t + 0.5].We repeat this procedure for all days for which we have daily number of recoveries available, and then combine the individual recovery times.A random sample of this data on individual recovery times is then fed into the likelihood function R in equation (20).

ML-based inference using data produced by the PW model
As a very first step toward assessing the ability of the inference scheme to recover the expected parameters, we first fitted the PW model (see Eqs. (9)-13) to daily cases data generated by the PW model and contaminated by some noise, whose dispersion was manipulated as will be described.
The top row of Figure 2 shows the histograms of parameters obtained when fitting M = 1000 data-series, i.e. solving Eqs.(9)-13 with true [R 0 , n, γ, I 0 ] = [2, 6, 1/14, 1] and N = 10000.Here, noise was simulated according to Eq. ( 15) using k = 0.0005 (i.e., very low dispersion).These results confirm that the mean values are close to the true parameters, which is expected because the value of k is very small.
To illustrate the sensitivity of the estimation process to the value of the dispersion parameter, we repeated the fitting process when considering 5 levels of dispersion, from 0.0005 to 0.01.As shown by the bottom left panel in Figure 2, as the dispersion level increases, so does the range of inferred R 0 values.Nevertheless, the mean estimated value remains close to the true value in all cases.
Likewise, we found the inference process to be robust to the choice of time horizon (full epidemic t max = 150, partial epidemic including the peak t max = 80, epidemic up to the peak t max = 70, partial epidemic not including peak t max = 60).As shown by the bottom right panel in Figure 2, as the time horizon reduces, the range of inferred R 0 values increases but the average remains close to the true value.Importantly, whilst the inclusion of the peak does narrow the range of inferred values, it is not necessary for the inference process to correctly recover the expected value of R 0 .Note that an arbitrary cut-off of n < 500 was used for clarity of the plot.

Identifiability
As Fig. 3 shows, the inferred values of τ and n describe a hyperbola-like curve which indicates a clear identifiability problem; that is the values of τ and n cannot be disentangled.However, we make two important remarks.First, it is possible to characterise this hyperbola analytically.
Second, the values of τ and n combine favourably into the expression of R 0 whose inferred values are well behaved, see bottom panels in Fig. 2.
To formally characterise the hyperbola, we rely on quantities that can be derived analytically from the PW model.These are the leading eigenvalue (or growth rate under some transformation) and the final epidemic size.These are given below in terms of τ as a function of n.
where λ * L and s * ∞ = S * ∞ /N are obtained by setting all parameters to some desired values, (n, τ, γ) = (n * , τ * , γ * ); note that often R 0 instead of τ is given, with knowing the value of either being sufficient to have a well-defined system.The growth rate follows from the linear stability analysis of the pairwise model at the disease-free equilibrium, while the implicit formula for the final epidemic size can be found in [13] and is used here with initial conditions corresponding to the disease-free steady state.

ML-based inference using data from exact stochastic simulations
Five hundred Gillespie realisations were generated using parameters [R 0 , n, γ, I 0 ] = [2, 6, 1/7, 1] and N = 10000.Of these 500 realisations, M = 370 realisations did not die out. Figure 4 shows the histograms of the parameters estimated from fitting those realisations.Unlike with noisy realisations of the ODE, we also subjected I 0 to the inference process.Results (not shown) obtained when assuming I 0 = 1 during estimation revealed that the inclusion of I 0 in the estimation process was key to being able to account for the stochasticity in the onset of the epidemic, or more precisely, the time elapsed before the growth becomes exponential.For the purpose of initialising Latin hypercube sampling, values were taken in [0.01, 10].This particular choice has no bearing on our findings (results not shown).The mean of the estimated I 0 was found to be 1.355, i.e., close to the expected 1; however, it showed a broad distribution of values, ranging from 0.012 to 5.534.Comparing these histograms to those shown in Figure 2, we find that whilst the mean estimated values do not significantly differ, the variance in estimation is, not surprisingly, substantially larger.To quantify this more precisely, we calculated the mean (and standard deviation) of the confidence intervals on R 0 over all M = 370 realisations.Specifically, we determined the nominal 99% profile likelihood confidence interval widths for R 0 as described in [11].Confidence intervals are 0.534 ± 0.203 compared to 0.498 ± 0.071 when fitting the ODE realisations with noise (dispersion level k = 0.0005).These results are representative of those obtained when calculating confidence intervals for the other parameters (not shown).

Inference based on DSA
Before describing the results of DSA on the synthetic data, we highlight that, unlike the MLEbased approach which either assumes or infers both population size and initial number of infected individuals (see also Section 5.5.1),DSA inherently assumes an infinite population size (for both susceptible and infected individuals).Therefore, we do not infer the initial number of infected individuals.However, the ratio of initially infected to susceptible individuals, the parameter ρ, can be meaningfully inferred.In fact, having observed a finite number of infections in a given observation window [0, T ], DSA is also able to infer an effective population size using the discount estimator [9,4]: where k T is the number of infections observed by time T > 0. It should be noted that estimates of the effective population size depend on the observation time T , and could be substantially different from the true population size when applying the method to a real epidemic.Nevertheless, as evidenced by the posterior distributions of the parameters (τ, R 0 , n, γ, ρ, n T ) shown in Figure 5, for this synthetic dataset, the method is able to infer the parameters well.The posterior distributions are unimodal, centred around the true values of the parameters.Here, at first random samples of individual infection and recovery times (of size 5000 each) were constructed from the count dataset (one single trajectory of the Gillespie simulation) by drawing samples from appropriate uniform distributions (see Section 4.2).These random samples were then fed into the HMC scheme using four parallel Markov chains.Uninformative, flat priors were used.For this dataset, the parameter values estimated by both approaches are comparable.However, it is important to note that the two methods adopt two quite different likelihood constructions.Whilst the MLE-based approach relies on counts and the size of the population to construct the likelihood function, the DSA likelihood function only requires a random sample of infection times (and recovery times, if available).In other words, whilst the MLE-based approach assigns a likelihood to the epidemic trajectories, DSA identifies the probability laws of individual transfer times (infection and recovery times).These are often, even if censored, or truncated, more reliable and easily observed or derived statistical data than counts.For instance, even when we have partially observed count data on daily new infections, one can create a random sample of infection times (possibly censored/truncated).Even when the entire population is not monitored and only a set of randomly chosen individuals are followed through time and their transfer times are noted, the DSA methodology is still applicable.This advantage of DSA is particularly important when we fit the PW model to real epidemic data, which we do in the next section.

System size and the MLE approach
In deploying the MLE approach to the above data, we used our knowledge of the true value of N .With real-world datasets, however, such information is typically not available.Whilst this is not an issue for DSA since it can infer an effective system size, it is for the MLE-based approach particularly in light of the unidentifiability issue discussed in 5.2.In what follows, we infer the value of N along with the other parameters, accepting that the increase in dimensionality of the parameter space will likely exacerbates unidentifiability.Here, we investigate the robustness of the inference process when inferring known parameters on the stochastic realisations presented in Section 5.In particular, the percentage error in N is under 1.5% (For reference, the percentage error for DSA on the same data is in the order of 0.01%).Nevertheless, as shown by Figure 6, there is substantial variance in the estimates including significantly higher values of both N and R 0 (e.g., 70 estimates have R 0 > 4) despite excellent fits.To illustrate this point, we plotted the estimates on the (τ, n) plane (see Figure 7) and confirmed that they conform to the unidentifiability curve previously identified.The inset shows two stochastic realisations and the corresponding fits with one fit producing an estimate for the degree n close to the true value ( 6) and one producing an estimate magnitudes of order larger (275).As shown by the

FMD data
Unlike DSA, the MLE approach only provides a single point estimate.This makes it difficult to provide a meaningful comparison of the two methods.To mitigate this issue, we repeated the MLE inference process 100 times, each time using a different set of initial conditions.To construct the equivalent of a posterior, we included all parameter values obtained in each of the 100 times, provided the nLL was sufficiently close to the best nLL over the 100 rounds.The number of estimates excluded for each dataset will be reported but highlights the fact that the search algorithm can get stuck in very sub-optimal local minima.
Histograms of inferred parameters for the FMD dataset using the MLE approach are shown in Figure 8. 11 out of 100 estimates were excluded because of an anomalous outcome of the inference process.The estimates with the lowest nLL are I 0 = 10.54,R 0 = 2.58, n = 153.67,γ = 0.0723, k = 0.010, and N = 1817.2.There is quite a bit of dispersion around the parameters, with fairly fat tails.For example, whilst the median for R 0 (2.71, see Table 1) is relatively close to the best estimate, we also observe some fairly large values (in fact 10 out of 100 estimates were excluded because of R 0 > 10).The best and median estimate for N was 1817 and 1747 respectively.This number is very likely implausible as well more than 2000 farms will have been involved in the epidemic, but see DSA results below.Likewise the inferred average degree seems far overinflated.The value of γ 0.07 implies 14 days for the infection period.Note that previous studies, see [4] for example, have reported a mean of 10.2 days.Importantly, the fits are good with all (accepted) estimates showing a very narrow range of nLL values (from 233.03 to 248.67 with a mean of 236.31 and a std of 4.06).This once again provides evidence of the fact that the MLE approach ascribes a likelihood to the trajectory produced by the inferred parameters rather than to the parameters themselves.
Figure 8: Distributions of [I 0 , R 0 , n, γ, k, N ] using MLE on the FMD data using 100 rounds of inference with different initial conditions.The median values are listed in Table 1 and compared with the DSA approach in Section 5.5.5.Five estimates for which n > 100 (154, 156, 279, 294 and 368) were excluded from the figure (but not the statistics) for improved readability of the histogram.
The posterior distributions obtained by DSA method on the FMD dataset are shown in Figure 9.
It is important to note that, unlike with the MLE approach, these results were obtained when using an informative prior, an exponential distribution with mean 10.2 days, for the γ parameter following on the analysis in [4].The posterior distributions are unimodal.The mean estimates are consistent with previously reported values, for example in [4].Interestingly, and as with the MLE approach, the estimated effective population size is less than 2000.This is not to be confused with the number of farms, however (see brief explanation in Section 5.5.5).

H1N1-N18234
The A(H1N1) dataset presents an interesting challenge as it has a long persistent tail with visible stochastic effects.We therefore present two sets of results: one where we infer parameters on the full dataset (i.e., including the tail) and one when we restrict to T = 42.Figure 10 shows the results of the MLE-based approach for both scenarios.As clearly evidenced by the bottom right panel of Figure 15, when the full horizon is considered, the fits are poor, the noisy tail seemingly obfuscating the true trajectory of the epidemic.Not surprisingly, the parameter estimates appear meaningless and highly variables from one round of inference to the other despite similar nLL (see Table 1).When restricting to T = 42, the fits are good and the parameter estimates are slightly better behaved albeit with not unimodal and with implausibly large n considering the inferred population size N .In fact, only 51 out of 100 parameter estimates survived once we excluded 3 estimates for being poor fits, 13 for excessive values of R 0 (> 10) and 33 estimates for excessive value of γ > 1.Interestingly, we note the high value of k inferred in both scenarios, with MLE correctly recognising the high dispersion of the counts.
When deploying DSA, once again, a prior was used for γ (γ −1 = 5.5) based on published literature (see [19,9]).based on the full and partial data respectively.As with the MLE-based approach, when fitting to the full data, the DSA fit is poor, and in fact, very similar to that of the MLE approach (see bottom right panel of Figure 15).When removing the noisy tail of the data, the quality of inference improves significantly with both MLE and DSA producing near identical fits (bottom left panel of Figure 15).However, unlike with the FMD dataset, the inferred parameters are quite different although interestingly the ML-estimated population size and the DSA effective size are very similar (see Table 1).

COVID-19 in India
Figure 13 shows the histograms of the estimates obtained by the ML-based approach on the final dataset.Here, unlike with the previous dataset, there was high consistency between estimates over the 100 rounds with no exclusions needed.Curiously, this homogeneity of results is associated with an apparent mismatch between the fitted model and the data, as shown by the top right panel in Figure15.
As in the synthetic data study, random samples of individual infection and recovery times (of size 5000 each) were constructed from the count dataset.These random samples were then fed into the HMC scheme using four parallel Markov chains.Uninformative, flat priors were used.The posterior distributions of the parameters (τ, R 0 , n, γ, ρ, n T ) using the DSA method are shown in Figure 14.The estimated parameters correspond to probability distributions that have similar measures of central tendency as those reported in an earlier analysis of the data in [4].
Interestingly, for both methods, the majority of the probability mass in the (posterior) distribution for the degree (n) is concentrated around small values, indicating a low contact pattern.This is in agreement with various non-pharmaceutical interventions such as lockdowns that were put in place to reduce the spread of the virus.Finally, both ML-estimated population size and DSA effective size are in the same order of magnitude.1.

Comparison across real-world datasets
Figure 15 shows the data for all three real-world outbreaks together with fits produced when taking the best parameter estimates using the ML-based approach and the median values of the posteriors produced by DSA.Whilst our investigation of the COVID-19 dataset supports a likefor-like comparison between inference schemes, there are differences in the way the analyses of the FMD and the A(H1N1) datasets were carried out.Specifically, whereas no prior was involved in the MLE-based approach, informative priors (based on published literature) were used for the Hamiltonian Monte Carlo scheme for DSA.This reflects an important and fundamental difference between MLE-based approach and DSA methodology (here implemented via a Hamiltonian Monte Carlo scheme), namely that the latter follows a Bayesian route.It should be noted, however, that the effect of the choice of priors should vanish in the limit of a large number of data points.
With this in mind, we can make several observations: • In general, the fit to the real data is good except in two cases.In the COVID-19 data, despite relatively similar parameters between methods, the DSA fit appears to capture the trend of the data a lot better than the MLE fit where a clear mismatch is being observed.The scenario in which the full H1N1 epidemic is subjected to inference highlights the challenge of highly variable, potentially noisy, data, as well as the impact of the observation period.In particular, as shown by the bottom two panels of Figure 15, the longer observation window allows the long and noisy tail of the epidemic to dominate, with both approaches missing the rise and fall in the daily new cases.
• Table 1 provides two sets of estimates for the MLE approach.As indicated previously, this is for comparison purposes, the MLE process was repeated 100 times using different initial  conditions.The estimate denoted 'best' is therefore the 'true' MLE estimate (in the sense of being the one with maximum likelihood over all estimates of all rounds).Nevertheless, in the above, we kept all estimates provided their likelihood was close enough to that of the best one.In many cases, we observe a large difference between best and median.This is yet another manifestation of the unidentifiability problem whereby vastly different values of the mean-degree can result in likelihoods very close to the best one (i.e., with the same quality of fit).Interestingly, we note that, in general (a few estimates were excluded as per the text), the impact of unidentifiability did not affect R 0 as much as other parameters.
• The estimates for I 0 and population size, N , are relatively similar across both inference approaches, except for A(H1N1) when the full dataset is considered and COVID-19.For the A(H1N1) outbreak, the MLE method appears to overestimate N by a large margin.Note that Washington State University campus is located in a relatively small town with a student population of size around 18000 and a resident population of size around 9000 [9].For the COVID-19 wave in India, the DSA median estimate of 5204 for I 0 appears smaller than the true count of 11592 new cases on 16 February 2021, whereas the MLE method seems to overestimate it (median and best estimate of 33682 and 33130, respectively).It should be noted that the effective population size is a by-product of the DSA method (see Section 5.4).Strictly speaking, the parameters I 0 and N are far less meaningful in DSA than in MLE which requires them.However, keeping track of the DSA estimates n T of the effective population sizes at times T is valuable in that it gives us a sense of the possible size of the epidemic and therefore, could be used for monitoring an ongoing epidemic [8].
• Comparing the distributions obtained by DSA and MLE for the FMD data, we find the range of average degree obtained by DFA to be much better behaved than that obtained by MLE with mean and median being close and with a numerical value that seems more realistic.This observation holds for all datasets with DSA producing more realistic estimates.This is ultimately linked to the fundamental difference between how the likelihoods in the MLE and DSA approach are formulated.Whilst the MLE method simply minimises the mismatch between model trajectory and data, the DSA likelihood captures the underlying probability laws of individual infection and recovery times.More specifically, it models the underlying survival function through the [S](t) curve parameterized by (n, τ, γ, ρ) (and implicitly, by the observation time T ).

Discussion
In this paper, we have investigated the ability of a network-based mean-field model, i.e., the pairwise model, to infer not only disease parameters but also some of the underlying network.Outbreak data encapsulate the interplay between contact network and epidemic spreading.However, daily new cases or other data incorporate network information only implicitly.Hence, it is interesting to investigate whether from such data one can learn about the underlying contact network.Several challenges arise; for example, an epidemic with a small transmission rate on a dense network may look very similar to an epidemic with a large transmission rate spreading on a sparser network.Hence, it is not a given that outbreak data hold a specific enough signature of the contact network.
In fact, our investigation revealed an anti correlation between the value of the transmission rate and the density of the network.Regardless, the estimate of both parameters peaked at around the desired values, especially when ground truth was known.
While the pairwise model used in the paper assumes that the network is regular and only accounts for the number of links each node has, it is possible to relax this seemingly restrictive assumption.
In [8], DSA was used for an SIR epidemic on a configuration model network with Poisson degree distribution.Recently, it has been shown [12] that the pairwise model remains exact for networks with binomial, Poisson or negative binomial degree distribution; see also [10, Corollary 1, Section 5.2] where a similar result was derived for a susceptible-infected (SI) process on configuration model random graphs.The difference in the degree distributions manifests itself in the PW model via the type of closure one uses.For example, if the underlying network has a Poisson degree distribution, then ξ is simply set to ξ = 1, and the parameter of the Poisson distribution, and hence, the network enters the PW model via the initial conditions.A similar modification is possible for networks where the degree distribution is negative binomial thus separating mean from variance.These all offer extensions and improvements above and beyond what the PW model was able to capture about the network.Moreover, employing the edge-based compartmental model, another network-based mean-field model, which uses the probability generating function of corresponding to the degree distribution of the network makes it possible to aim for learning the degree distribution of the

Figure 1 :
Figure 1: Prevalence based on Gillespie simulations.Thin lines/cloud in grey are the outcome of ∼ 100 individual realisations (10 networks with 10 realisations each) of an SIR stochastic epidemic on regular networks (n = 6), with their average plotted in thick red lines.Epidemics are started with I 0 = 100 (left panel) and I 0 = 250 infectious nodes chosen at random (middle and right panels) and only epidemics that reach 2I 0 are kept and averaged over.The numerical solution of the corresponding pairwise model is plotted as a continuous black line.All networks have N = 10000 nodes and the recovery rate is γ = 1.From left to right, τ takes value 0.3, 0.4 and 0.5, respectively.
initial condition [D](0) = nρ and [S](0) = 1, where, as before, ξ = (n − 1)/n and [S] satisfies the pairwise mean-field equation with [S](0) = 1 and [I](0) = ρ.The reason we normalize the system so that [S](0) = 1 will be clear when we describe the DSA likelihood.Now, dividing the above equation by [S] = −τ [S][D], solving for [D] in terms of [S] with initial condition [S](0) = 1 and then putting the solution back in [S] = −τ [S][D], we get ξ , with initial condition [S](0) = 1.In essence, DSA interprets the susceptible curve as an improper survival function for the time to infection of a randomly chosen initially susceptible individual.That is, [S](t) = P(T I > t), where the random variable T I describes the time to infection.Because [S](t) is interpreted as a survival function, we set [S](0) = 1.This survival function is improper because lim t→∞ [S](t) = P(T I = ∞) > 0. However, we can transform it into a proper survival function by conditioning it on a final observation time T ∈ (0, ∞).We define the probability density function f T on [0, T ] as follows:

Figure 3 :
Figure 3: Scatter plots of the parameter estimates on the n, τ plane with the two unidentifiability curves calculated as per Eqs.22 (dotted line), and 23 (dashed line).The star denotes the true values, i.e., true n and calculated value of τ given true values of R 0 and n.Main panel: scatter plot when the full epidemic is used for inference.Inset: scatter plot when the time horizon does not include the peak, i.e., t max = 60.Note that an arbitrary cut-off of n < 500 was used for clarity of the plot.

Figure 6 :
Figure 6: Inferring distributions for [I 0 , R 0 , n, γ, k, N ] for the stochastic realisations.The ground truth parameter values (R 0 , n, γ and N ) are denoted by vertical dashed lines.Data shown correspond to 289 out of the 370 stochastic realisations (see detail in text).

Figure (as well
Figure (as well as the likelihood values), the fits are equally excellent.Inferred parameters for the data with the expected degree were: I 0 = 2.38, R 0 = 2.18, n = 6.05, γ = 0.111 and N = 9689.76,i.e., close to the ground truth data.In contrast, the inferred parameters for the data with the large degree were: I 0 = 0.24, R 0 = 10.56,n = 274.92,γ = 0.024 and N = 9281.53.

Figure 7 :
Figure 7: Main panel: Scatter plot of the parameter estimates on the n, τ plane with the two unidentifiability curves calculated as per Eqs.22 (dotted line), and 23 (dashed line).The star denotes the true values, i.e., true n and calculated value of τ given true values of R 0 and n.Only those estimates who did not provide a good fit, as per the criterion above) were excluded, resulting in 304 surviving estimates.Inset: Empirical data and fit for two stochastic realisations corresponding to the triangles in the main panel with two significantly different inferred degree n (see detail in text).

Figure 10 :
Figure 10: Distributions of [I 0 , R 0 , n, γ, k, N ] using MLE on the H1N1 data (with horizon restricted to 42, top panel and full data, bottom panel) using 100 rounds of inference with different initial conditions.The median values are listed in Table1.

Figure 13 :
Figure 13: Distributions of [I 0 , R 0 , n, γ, k, N ] using MLE on the COVID-19 dataset using 100 rounds of inference with different initial conditions.The median values are listed in Table1.

Figure 15 :
Figure 15: Illustration of the real-world outbreak data (top-left -2001 FMD outbreak in the UK, top-rightthird wave of COVID-19 in India, bottom panels -H1N1 outbreak with short (left) and long horizon (right)) together with output from the pairwise model with point estimates from MLE (values with best likelihood) and DSA (median values).All parameter values are given in Table1.

Table 1 :
Summary statistics of the inferred parameters for the three empirical datasets considered in this study when using both MLE and DSA approaches.Estimates for I 0 and N were rounded to the nearest integer for readability.