A dynamic extreme value model with applications to volcanic eruption forecasting

Extreme events such as natural and economic disasters leave lasting impacts on society and motivate the analysis of extremes from data. While classical statistical tools based on Gaussian distributions focus on average behaviour and can lead to persistent biases when estimating extremes, extreme value theory (EVT) provides the mathematical foundations to accurately characterise extremes. In this paper, we adapt a dynamic extreme value model recently introduced to forecast financial risk from high frequency data to the context of natural hazard forecasting. We demonstrate its wide applicability and flexibility using a case study of the Piton de la Fournaise volcano. The value of using EVT-informed thresholds to identify and model extreme events is shown through forecast performance.


I. INTRODUCTION
N ATURAL hazards and extreme financial loss can be seen as extreme events, i.e. events which have low probabilities of occurring under normal circumstances.Due to their imbalanced number of occurrences, we usually have more data on common, non-extreme events than on extreme events.However, it can be shown that fitting a distribution to all the data via classical statistical methods leads to reasonable fit to the bulk of the data at the expense of poor fit to the tails in which the extremes lie [1].Extreme value theory (EVT) and its corresponding models seek to remedy this disparity by offering guidance on when an extreme regime kicks in and how it can be modelled (see [2], [3]).For example, under known conditions, the distribution of excesses above a fixed threshold can be shown to converge to a Generalised Pareto Distribution (GPD) as the threshold increases.By knowing this asymptotic distribution, we can conduct model checks and generate sensible estimates of extremal behaviour.EVT is increasingly used in financial applications and was shown to give more accurate tail-risk predictions [4], [5].To address the fact that there is time dependence in financial returns which goes against the traditional assumption of independence [6], [7] propose a dynamic extreme value model which uses high-frequency realized measures of the daily asset price variation as covariates to model the probability M. Nguyen is with the Asian School of the Environment, Nanyang Technological University (NTU), Singapore 637459 (email: michele.nguyen@ntu.edu.sg).
A. E. D. Veraart is with the Department of Mathematics, Imperial College London, London SW7 2AZ, United Kingdom.
B. Taisne and D. Lallemant are with the Asian School of Environment and also with the Earth Observatory of Singapore, NTU.
Tan Chiou Ting was with the Earth Observatory of Singapore, NTU.
of exceeding a high threshold and the size of the excesses.Since the realized variation are time-varying, the estimates of threshold exceedance and excesses are also time-varying.
The extreme value model developed by [7] can be adapted to wider settings.In this paper, we demonstrate this by adapting the model for natural hazard forecasting, specifically for volcanic eruptions.Just as how we can define extreme loss as an exceedance over some financial threshold, we can define extreme volcanic activity as the threshold exceedance of some, e.g.energy, index.The contributions of this model to the eruption forecasting literature are manifold.While a short overview of existing methods is provided in the Supplementary Information, we highlight the key differences and contributions in the next section.

A. Contributions of the proposed model
The full and varied potential of machine learning algorithms for short-term eruption forecasting has yet to be realised [8]- [10].The proposed dynamic extreme value model draws on techniques from machine learning, time series analysis and EVT.To adapt it from its original financial context to wider settings, we need to decide on: 1) a suitable index to compute threshold exceedances; 2) the look-ahead window or forecast horizon; 3) the auxilliary information or covariates used to inform future behaviour; 4) and the time periods which we compute these from, i.e. the covariate window.In the seismic context, threshold exceedance for event detection is synonymous with first arrival picking algorithms such the classical short time average over long time average (STA/LTA) method and a recently introduced method using trace envelopes [11]- [13].Since trace envelopes can be interpreted as the amount of energy in the signal, in our demonstration, we will use them as eruption indices to take threshold exceedances of.These exceedances would hopefully relate to extreme regimes leading up to volcanic eruptions and can be forecasted using covariates.Note that by modelling the exceedance of an eruption index rather than raw monitoring signal (as done by [14] via a Bayesian event tree), we avoid the need to interpret estimates in real-time.Since different frequency bands within the seismic signal represent different physical phenomena [15], we will consider envelopes of frequency-filtered data.While some existing techniques like the failure forecast method estimate the eruption onset time directly, others including event trees define a look-forward window within which we estimate the probability of an eruption occurring.We take the latter approach.In particular, for illustration, we produce one-hour ahead forecasts to complement other longer term forecasts.Although a longer lead-time allows for more time to make emergency management decisions, notify the public and implement evacuations [16], forecasts are typically more accurate when made closer to the actual eruption time due to temporal divergence at larger lags (see for example, [17]).Eruption forecasting methods such as event trees, belief networks and process/source models presuppose precursors or associations between source mechanisms and time series signals [18].In contrast, our proposed methodology selects combinations of covariates which could represent precursors for any volcano and type of eruption if we train the model on corresponding data.Specifically, we test covariates inspired by machine learning classification algorithms for seismic signals [8].By combining these covariates, different aspects of the seismic data and relationships across different frequency bands are represented.An objective stepwise selection procedure is then used to determine which covariates are more informative for forecasting eruptions.

B. Paper structure
We continue in II by outlining the key components of the dynamic extreme value model introduced by [7].In Section III, we introduce our case study, the Piton de la Fournaise volcano, and illustrate how the model can be adapted for eruption forecasting.By comparing the effect of different choices of the threshold on the model fit and training performance, we highlight the value of using EVT to guide threshold choice in Section IV.In Section V, we evaluate the broader applicability of the method by refitting the model using three training event sets and testing the calibrated model on both event and non-event sets.In Section VI, we conclude by discussing our results and suggesting future areas for research.The code used for the analysis is publicly available at https://github.com/ntu-dasl-sg/dynamic-EV-forecasting.

II. THE DYNAMIC EXTREME VALUE MODEL
Following [7], let {Y t } t=1,...,T denote a time series of an index where higher values are associated with extreme events.Based on the selected threshold u ∈ R of the index, we define exceedances to be the binary indicators of whether the index is higher than the threshold and define the excesses to be the numerical values of the exceedances, i.e. how much we exceed by.The conditional probability that the index at time t, Y t , exceeds u by some excess z > 0 given prior information available at time t, F t−1 , can be written as: Here, φ t |F t−1 = P (Y t > u|F t−1 ) represents a timevarying, Binomial exceedance probability which can be modelled with a logit function: where t−1 , . . ., x t−1 ) denotes a vector of p covariates from the previous time step and is used to project the future probability.The parameters {ψ i } i=0,...,p can be estimated by maximising the likelihood function: where l is the lag at which the covariates x t become available and I t is the indicator of an exceedance at time t (it is equal to 1 if there is an exceedance and 0 otherwise).This is equivalent to using a logistic regression to model the probability of threshold exceedance.
In (1), the model for excesses of the threshold is given by a Generalised Pareto distribution (GPD): The shape parameter ξ k = ξ is estimated assuming a non-time varying GPD distribution for the excesses and kept constant for stability.To account for the time-varying nature of the excess distribution, we model the scale parameter ν t as a log-linear function: When κ i = 0 for i > 0, this means that it is sufficient to model the distribution of the excesses statically.From the definition of a GPD with non-zero shape parameter: where t−1 )/ξ, i.e. the excesses have an upper bound.The parameters {κ i } i=1,...,q can be estimated by maximising: where [x] + = max(0, x) and we add the time subscript to the excesses, z, to denote their temporal indices.Henceforth, we shall refer to the maximisation of ( 7) to estimate {κ i } i=1,...,q as GPD regression.When the estimated shape parameter ξ is not significantly different from zero, we set it equal to zero and use an exponential regression instead.

A. Data
We will use the dynamic extreme value model to forecast eruptions at the Piton de la Fournaise volcano.Situated on La Réunion Island, Piton de la Fournaise is one of the most active basaltic volcanoes, with an average of one eruption every 10 months [19].In addition to the existing seismic monitoring stations, 15 broad-band stations have been installed on the volcano as part of the Understanding Volcano project in 2009-2010 [20].The data collected is available at https: //www.fdsn.org/networks/detail/YA2009/.For our analysis, we use data for four eruption events and three non-events obtained at the UV05 station between 2009 and 2010.This was the closest available station to the January 2010 eruptive center [21].Table I shows the key characteristics of these events as outlined in [19].To evaluate the forecast performance in Section V, we will use three of the events for training the model and test it on another event as well as three non-events.Although an eruption also occurred on December 9th 2010, we did not use this data for training or testing because the location of the eruption (Flank N) is relatively far away from the rest of event locations and hence could skew our results.The non-event dates were chosen to be roughly halfway between the selected four events and represent quiet periods for which we should not expect threshold exceedances of our eruption indices.

B. Illustration of method
To illustrate the use of the dynamic extreme value model for eruption forecasting, we will first apply the method to the January 2010 event data (Training event 3) since it is best documented out of the chosen events.Figure 1a shows the raw seismic signal for the 1-5Hz frequency-filtered data.The bottom plot focuses on January 2nd when the eruption occurred and the dotted, dashed and bold blue vertical lines denote the start and end timings of the seismic crisis, swarm and eruption onset respectively.As documented in [19], a seismic crisis took place from 07:50 local time.Between 08:10 and 09:02, a seismic swarm with high level of seismicity was recorded.This is reflected in the seismic signal as larger fluctuations in the readings between the dashed blue vertical lines.The swarm was followed by a relatively quiet phase that directly preceded the onset of the eruption at about 10:20 (the eruption is indicated by the continuous seismic tremor in the figure).The whole eruption was estimated to have lasted 9.6 days, ending at 00:05 on January 12th 2010.
1) Trace envelope as an eruption index: Before fitting the dynamic extreme value model for eruption forecasting, we need to decide on eruption index which we want to consider threshold exceedances of.We propose to use the trace envelope E t for t = 1, . . ., T .This can be seen as the instantaneous amplitude of the seismic trace s = (s 1 , . . ., s T ), and can be computed as follows: 1) First, we compute the discrete Fourier transform (DFT) of s (this is implemented by the R function 'fft') for t ∈ {1, . . ., T }: where T is the length of the seismic trace.Set f = (f 1 , . . ., f T ).
2) Next, we compute the complex Hilbert fast Fourier transform (FFT) of s as: where the length-T series h = (1, 2, 2, . . ., 2, 2, 1) if T is even and (1, 2, 2, . . ., 2, 2, 2) if T is odd and the inverse Fourier Transform (IFT) is defined as: for t ∈ {1, . . ., T }. 3) For t ∈ {1, . . ., T }, the trace envelope of the seismic trace s is defined as: The above steps are in line with those used within the R function 'envelope' in the IRISSeismic package.In [13], the first arrival travel time picking algorithm was based on the envelope in decibels.Thus, we use Y t = 20 log 10 (E t ) as our eruption index.Figure 1b shows the trace envelope time series computed from the 1-5Hz frequency-filtered data for the January 2010 event.We focus on the 1-5Hz frequency band because it is strongly associated with volcanic-tectonic activity.We see that although there are some spikes at the start of January 1st, the envelope index remains relatively low around 50 decibels until the recorded seismic crisis on January 2nd (represented by the dotted blue vertical line in the bottom plot).From the time of the seismic crisis, the index increases to a first peak during the seismic swarm before waning slightly.About 10:00, the index increases steadily to plateau slightly above 80 decibels, the timing of which coincides with the recorded eruption onset (represented by the bold blue vertical line in the bottom plot).The relationship between the increases in the index and the seismic events enable us to use threshold exceedances to forecast eruptions.
2) Forecast horizon and covariate window: In addition to computing the index to take exceedances of, we also need to decide on: • The forecast horizon δ t : the time period between the time window where the covariates are computed and the forecast time.
• The time window within which past observations contribute to the covariates, W .For illustration, we use δ t = W = 1 hour so that we make 1-hour ahead forecasts with one hour of past data to inform the covariates in the model.This mimics the settings in [8], [22] and [18] where either one hour long signals were classified or the covariates were generated by scanning a moving window    of length one hour across the seismic signals.This framework is pictured in Figure 1c where the time period between t − 1 and t is δ t = 1 hour and the covariate window W = 1 hour.
Although we use the original, high-frequency (100Hz) data at different frequency bands (0.1-1 Hz, 1-5 Hz, 5-15 Hz, 0.1-20Hz and high pass 0.01Hz) to compute the covariates, we chose to produce forecasts only every 10 seconds to reduce unnecessary computational burden.For a practically useful workflow, this should be longer than the time required to compute the covariates from the past hour and use the fitted model to generate the forecast.
3) Threshold selection: Next, we select a suitable threshold to define the extreme regime to which we will associate the covariates and forecast extremal behaviour, i.e. exceedance.Based on the Anderson-Darling (AD) and Cramér-von Mises (CVM) tests for goodness-of-fit of the excesses to the GPD distribution [23], Figure 1d suggests that under a significance level of 10%, a threshold of 85 decibels would be reasonable since this is the lowest value from the right for which the p-value exceeds the significance level.
4) Covariates: To forecast threshold exceedances of our eruption index, we compute the covariates suggested in [8] for our frequency-filtered data (0.1-1Hz, 1-5Hz, 5-15Hz, 0.1-20Hz and high pass 0.01Hz) and their associated trace envelopes.[8] introduce three domains of representation of the seismic time series which could be useful: the original temporal domain, the frequency domain where spectral content is obtained via a Fourier transform and the cepstral domain where we compute the Fourier transform twice to highlight the harmonic properties of a given signal.For each of these three representations of the seismic traces and their trace envelopes, we can compute: • Statistical features: e.g. standard deviation, skewness, root mean square bandwidth, and kurtosis which captures the transition between two signals.
• Entropy features: e.g.Shannon entropy which describes the distribution of the amplitude levels of a given signal.
• Shape descriptor features: e.g.rate of attack (ROA) which is defined as max i si−si−1 n and rate of decay which is defined as min where n is number of observations within the covariate window.A feature or covariate can take on different meanings depending on the domain it is computed on.For example, the feature 'i of Central Energy' which is the time around which the signal energy is centered or the time centroid in the temporal domain, can be interpreted as the fundamental frequency in the frequency domain and the harmonic frequency in the cepstral domain.In addition, while the ratio of the maximum value over mean value can describe the contrast and relate to the cause of the event in the original temporal domain, in the frequency domain, it describes the spectral richness of the signature and in the cepstral domain, harmonic content of an observation.
To ensure that the model is not too sensitive towards extreme covariate values, the covariates were transformed before use.Box-Cox analyses were used to select which power or logtransformation was required to make their distributions more similar to Gaussian distributions.After tranformation, the covariates were standardised using their mean and standard deviations to be on similar scales.To account for multicollinearity, we ordered the covariates according to increasing Akaike Information Criterion (AIC) of their univariate models (for threshold exceedance and excesses separately).Then, we removed covariates which had more than 0.6 in absolute correlation to covariates which were deemed more informative than themselves.
5) Regression models: As outlined in Section II, we fit a logistic regression for threshold exceedances and an GPD regression for threshold excesses.The shape parameter of the GPD is fixed to the value of the estimate obtained using maximum likelihood estimation for a constant GPD.In this case, this has an asymptotically normal 95% confidence interval of (−0.300, −0.160) which does not include 0. The negative shape parameter implies that the distribution of the excesses is Pareto type II and lies within the Weibull domain of attraction which contains distributions with short tails, i.e. finite endpoints.The models were chosen by stepwise selection based on AIC.For the logistic regression, the default choice of backwards followed by forward selection was used; for the GPD regression, we used forward before backwards selection to avoid singularity issues which arise from having large numbers of covariates and excess uninformative covariates in the model.Steps 1-5 can be repeated to model threshold exceedances and excesses for the other frequency-filtered envelopes.

C. Results
The black line in Figure 2a show the one-hour ahead probabilistic forecasts for threshold exceedances of the 1-5Hz envelope.The forecasts are highest for January 2nd, the day the eruption started.Focusing on January 2nd in Figure 2b, the exceedance probability jumps about 1 hour before the recorded eruption onset at 10:20.This means that fitted forecast model is able to give about 1 hour ahead warning before the eruption.Similar to [7], we check the goodness of fit of the logistic regression using a deviance chi-squared test.The p-value was 0, indicating that the fitted model is significantly different from a null model.The usefulness of the covariates for explaining the temporal dependence in the occurrences of the threshold exceedances can also be seen through the reduction in autocorrelation of the Pearson residuals in Figure S1a of the Supplementary Information.In contrast, little temporal dependence was observed for the excess residuals in Figure S1b.Hence, there was no real benefit of using covariates to inform a dynamic GPD and a constant GPD would have sufficed.As we will see later, this is threshold-specific: when we use multiple events to train our model in Section V and select the lowest EVT-informed thresholds among the training events, there will be autocorrelation in the excess residuals and hence the benefits of modelling with a dynamic GPD.

IV. VALUE OF EXTREME VALUE THEORY
EVT was used to select the threshold which defined the exceedances and excesses being forecasted by the dynamic extreme value model.Figure 3a shows the goodness-of-fit plots comparing the modelled and empirical probabilities and return levels when 50% and 100% of the threshold informed by EVT was used.The latter provided a better fit to the model   Figure 3b shows that for the exceedance forecasting of the 1-5Hz, 0.1-20Hz and high pass 0.01Hz envelope indices, the forecast performance, as measured via Area Under the Curve (AUC), generally improves as we increase the threshold towards that informed by EVT.Hence EVT has benefits for modelling in terms of both goodness of fit to the data and forecast performance.

A. Using multiple training events
To assess the forecast capability of the dynamic extreme value model more formally, we will fit the model to all three training events and test on the remaining test event and the three non-events.There are a few ways we could determine a suitable threshold based on data from multiple events.An initial approach might be to simply combine the data across events and use all exceedances to inform the threshold.However, in our analysis, this led to a relatively high threshold estimate of 95 with Training event 2 dominating the model fit because there were comparatively more exceedances from Event 2 than Events 1 and 3 (see Section IV of the Supplementary Information).
An alternative approach, which we will use, would be to estimate thresholds for the training events separately and use the lowest estimate across events.This will ensure that we have sufficient exceedances to represent each event.For the 1-5Hz envelope index, the lowest threshold among the three training events was 85 and the estimated GPD shape parameter is ξ = −0.125 with 95% confidence interval (−0.146, −0.104).
Tables I and II of the Supplementary Information show the chosen covariates with their transformations and parameter estimates for the fitted logistic and GPD regressions respectively.As we will see in the next Section, this threshold choice leads to reasonable one-hour ahead forecast probabilities for all three training events, the test event and the three test non-events.By lowering the threshold to identify extremes across all three training events, we also see more benefit of modelling the threshold excesses dynamically because there is autocorrelation in the excess residuals, particularly for Training event 2 (see Section II of the Supplementary Information).In contrast to our initial illustration with just Training Event 3 in Section III, we see that modelling the excess distribution dynamically leads to better estimation of extreme quantiles, though there is still room for improvement.This is illustrated in Figure 4.

B. Training and test performance
After fixing the threshold for which to model exceedances, we fit our dynamic extreme value model to data from the three training events.Figure 5 shows that apart from an outlier on the first day of Training event 3, the forecast probabilities remain low (e.g.below 0.3) for all three events until the time of their recorded seismic events.For Training event 1 (referring to Figure 5b), sustained high forecast probabilities start during the seismic swarm (between the dashed vertical lines), slightly more than one hour before the recorded eruption at 17:00.For Event 2, one hour ahead eruption warnings can also be made as the forecast probabilities begin to take high values during the seismic swarm, about an hour before the recorded eruption at 14:40 (see Figure 5d).For Event 3, the forecast probabilities gradually increase from the time of the seismic swarm around 08:30 before jumping up to a higher plateau about an hour before the recorded eruption onset at 10:20 as shown in Figure 5f.The features of the forecast probabilities, namely the sharp jumps from near-zero for Training events 1 and 2, the gradual increase for Training event 3, the presence of outliers and the tendency to increase one hour before the recorded eruption onsets, stem from the chosen covariates of the logistic regression.As can be inferred from their high coefficient estimates in Table I of the Supplementary Information, the logistic regression for threshold exceedance has three covariates which contribute to forecast probabilities more than the others: 0.1-20Hz cepstral kurtosis, 0.1-20Hz cepstral skewness and high pass 0.01Hz energy.Figures S12-S15 in the Supplementary Information show the time series of these covariates for the training and test events.Unlike the 0.1-20Hz cepstral kurtosis and skewness, the high pass 0.01Hz energy has more block-like features in its time series.This influences the sharp jumps from near-zero for Training events 1 and 2 during their seismic swarms.In constrast, the change in high pass 0.01Hz energy during the period of the seismic events on January 2nd of Training event 3 was more smooth, resulting in a smoother increase in forecast probabilities.The block-like features result from one extreme value in the high pass 0.01Hz signal which causes high energy values for the length of the moving covariate window (1 hour).Since energy was defined as the sum of the squared signal, this covariate is also very sensitive to outliers.Future exploration can be done for making the covariates more robust to outliers.Still focusing on high pass 0.01Hz energy, we notice from   to provide good one-hour ahead forecasts for the eruptions.We hypothesise that the length of the covariate window and the forecast horizons can be optimised depending on the expected seismic crisis and swarm durations at a volcano.If the seismic swarms precede the eruption by a longer time period, as in the test event, the one hour ahead forecast probabilities may not be so temporally accurate.This is what we observe in Figure 6a for the one-hour ahead forecast probabilities for the test event.
Here, we have a seismic crisis of 5 hours 30 minutes instead of the 1-2.5 hours durations for the training events.Referring to Figure S15 of the Supplementary Information, we see that the sole forecast probability outlier on October 13 and the spikes in forecast probabilities in the earlier part of October 14 are in line with spikes in the high pass 0.01Hz energy covariate.However, while the largest energy value was recorded during the seismic swarm (see Figure S15f), the forecast probability was higher nearer to the actual eruption onset.This shows the effect of the other covariates, including the 0.1-20Hz cepstral kurtosis and skewness, which work together to moderate the forecast probabilities.By nature of the covariates involved, the forecast probabilities are sensitive to different aspects of seismicity.
In addition to the training and test events, we examine the potential for the dynamic extreme value model for eruption forecasting by looking at its performance for the three nonevents.In line with expectations, Figures 6b-6d shows that the corresponding threshold exceedance forecasts remain very low, compared to the magnitudes during the training and test events.Similar training and test results were obtained for the 5-15Hz frequency-filtered data.The corresponding plots are given in Section VII of the Supplementary Information.

VI. DISCUSSION AND OUTLOOK
In Section III, we fitted the dynamic extreme value model to Training Set 3, the seismic time series for the January 2010 eruption at Piton de la Fournaise.Promising results were obtained with spikes in the probabilistic forecasts about an hour prior to the eruption onset.This means that in this case one hour ahead warnings can be made with the chosen setup: using the 1-5Hz trace envelope as an eruption index with a one hour covariate window and one hour forecast horizon.Similar performance was also seen when we fitted the model to more data in Section V.In addition to the good forecast results, we saw the value of using EVT to choose the threshold for which we model exceedances.An appropriate threshold is important because it determines the balance between the amount of exceedances used to inform covariate selection and the adherence of the threshold excesses to the asymptotic theory.In general, with a higher threshold, we have less exceedances to inform the model which could mean higher estimation uncertainty but the excess distribution becomes closer to a GPD.This distribution is useful for computing extreme quantiles such as value at risk in the financial risk management.We showed in Section III that using EVT to choose the threshold can improve forecast performance.When we increased the threshold towards its EVT-informed value, the AUC, a measure for how well the model can distinguish between exceedances and non-exceedances, generally increased for the 1-5Hz, 0.1-20Hz and high pass 0.01Hz envelope indices.Intuitively, the threshold determines what it means to be extreme and hence would affect the covariates selected and thus, the forecast performance.In practice, one would aim to train the forecast model with as much relevant data as possible.In Section V, we demonstrated the considerations to make when incorporating data from different events.For example, we used the p-values of GPD goodness-of-fit tests to identify thresholds for which the excesses can be seen to follow a GPD.This procedure assumes that the same, constant GPD applies to all the training events.However, as we have observed, different events can suggest different thresholds.Specifically, for Training Event 2, we inferred a higher threshold for the energy-related envelope index.This difference in envelope index values between events could be because some eruptions occur closer to the measurement station or because the eruption itself has higher flux.Our proposed strategy to deal with the different threshold choices is to take the lowest identified threshold.The GPD regression component of the dynamic extreme value model would then model the non-stationarity of the GPD with appropriate covariates.In fact, modelling the excess distribution dynamically was seen to be more useful when we trained the model with multiple events and the lower threshold as compared to previously with just a single training event in Section III.If instead, we chose the higher threshold, as suggested when we combined all the training data, we would forecast only large events because Training Event 2 would dominate the model fit, resulting in our inability to forecast Training Events 1 and 3 well.The proposed modelling framework can be extended in various ways.In our analysis, we used one hour covariate and forecast windows.More work can be done to optimise these durations which are likely to be related to the seismic crisis and swarm durations.While the training events had seismic crises which lasted about 1-2 hours, the test event had a much longer seismic crisis duration of 5h 30min.This could explain why the training forecasts were more temporally precise than the test forecasts.In our analysis, we focused on seismic signals in the 1-5Hz frequency range.As noted in the literature [15] and observed from the AUC comparison in Figure 3b, some frequency bands can be more useful for eruption forecasting than others.By combining information across useful frequency ranges through a joint, multivariate model, we can provide a more comprehensive eruption forecast.Similarly, we can incorporate different monitoring signals such as gas emissions and ground deformation in our model.So far, we have only used indices and covariates based on seismic signals due to the prevalence of seismometers as volcano monitoring tools.Future extensions can include different monitoring signals and account for their interdependence and shared covariates via joint models.Data from one seismic station (UV05) was used in our analysis.The promising results indicate that such methods can be useful for volcanoes where there is only one measuring station.Nevertheless, when we have multiple stations, we can extend the framework to model signals from the monitoring network as a whole and make fuller use of available data.Since distance from the eruption location is related to higher detected energies, varying forecast probabilities from different stations might help inform not just the timing, but also the location of the eruption.Given that the dynamic extreme value model has worked well for financial forecasting [7] and we have shown that it can be adapted for volcanic eruption forecasting, we postulate that it has high potential to be further adapted for wider applications.In particular, high sampling-rate data were used in both the financial and volcanic contexts.For the former, it was used to compute realised variations over relatively short horizons while in the latter, it helped to separate different frequency bands of interest.High sampling-rate data relevant to other natural hazards and crises are also being made available; however, what is deemed as high sampling-rate is highly contextspecific.For example, for sea levels, data were previously only publicly available at the monthly or annual scales.Hence, 1-15 minute resolutions are deemed as high sampling rate.These are increasingly sought after to study extreme sea levels and coastal flooding [24]- [26].Using such data, the dynamic extreme value model can be adapted to forecast extreme sea levels and their impact on coastal communities.While high sampling rates are good to have, the framework itself does not depend on this and in the absence of such data, we can still forecast, albeit on a coarser scale.To adapt the dynamic extreme value model to wider settings, there are several general considerations, namely: what is a suitable index to compute threshold exceedances of, what are reasonable forecast horizons and covariate windows, and what covariates can be used to inform future behaviour?One might also consider using algorithms for selecting the threshold automatically (see for example, [23]).A general practical consideration which is shared across all contexts, be it finance, volcanoes or other hazards and crises is the translation of the forecast probabilities into decisive action.What forecast probability warrants a warning or more drastic measures such as evacuation?The optimal strategy may not be straightforward but may involve many competing priorities and constraints, and should be determined on a case-by-case basis with multiple stakeholders and not just by the modellers and scientists.

ACKNOWLEDGMENT
The data used for the analysis was collected by the Institut de Physique du Globe de Paris, Observatoire Volcanologique du Piton de la Fournaise (IPGP/OVPF) and the Laboratoire de Gèophysique Interne et Tectonophysique (LGIT) within the framework of ANR 08 RISK 011/UnderVolc project.The sensors are properties of the réseau sismologique mobile Franc ¸ais, Sismob (INSU-CNRS).This work is supported by approach can be applied to both event trees and belief networks to embed prior information as well as facilitate updating with new data.While associating tree or network nodes with meaningful phenomena and factors influencing an eruption is useful for interpretation, apriori decisions need to be made to decide what these factors should be.There is a need to consciously tailor the models to each volcanic and monitoring system.In addition, many of these models define thresholds at each node to turn continuous parameters into discrete states, losing the richness of the input data.Process/source models also make assumptions about unknown subsurface properties and volcanic processes which can differ between volcanoes [19]- [23].By grouping similar signals from time series data and associating them with source mechanisms, we can track the evolution through the stages of volcanic activity and hence forecast e.g.eruption time.Another well-established method is failure forecasting [24]- [28].For any acceleratory geophysical precursor such as deformation or seismicity, we can fit a curve to the plot of its rate against time based on an empirical law of material failure.This enables us to estimate failure time as the time at which acceleration reaches infinity.While failure forecasting is relatively easy to implement, one needs to choose appropriate signals to monitor and relate their failure time to the eruption onset time.This would depend on for example, sensor location and depth.It can also be difficult to accurately quantify the uncertainty associated with the estimated onset time because the non-Gaussian model residuals which result are in conflict with the normality assumption underlying the regression used to fit the model.Other limitations include sensitivity to data truncation and signal fluctuations or deceleration which could occur in the immediate period before an eruption.The method we propose in this paper falls under the category of machine learning.By using mathematical algorithms, we can learn patterns from data, including complex associations which we may not identify by eye or have imagined beforehand.While machine learning methods such as multi-layer perception, neural networks, radial basis functions, CART regression or decision trees and support vector regression have been applied to eruption forecasting, their full and varied potential have not yet been realised [11], [29], [30].In the ACF plots, the blue dashed lines denote the 95% Barlett confidence intervals for white noise.

II. EXAMINING THE SUFFICIENCY OF
using data from the January 2010 eruption event (Training set 3).For this dataset, we concluded that while the computed covariates were useful for explaining the temporal dependence in threshold exceedances, there was less temporal dependence in the excesses.Hence, there was no benefit of using covariates to inform a dynamic Generalised Pareto Distribution (GPD) for the excesses and a constant GPD would have sufficed.This was not the case when we later trained the model on multiple events.Here, we provide the plots to support these conclusions.
Figure S1a shows the estimated autocorrelation against temporal lag for the Pearson residuals from the fitted logistic regression and the constant probability model.Since the black line is generally lower than the red line, the residuals exhibit less correlation when conditioned on the chosen covariates.This indicates that the conditional independence assumption in the logistic regression is better adhered to than in the constant probability model.In contrast, Figure S1b shows the estimated autocorrelations of the excess residuals from the dynamic GPD regression and the constant GPD model.The excess residuals are defined as the differences between the observed excesses and the predicted means.The plot shows that the residuals from a constant GPD model do not exhibit much autocorrelation so there is no incentive to model it dynamically.This is despite the fact that the AIC for the constant GPD model is higher than that for the dynamic GPD model (1663.599versus 1614.226).
When we train the model on multiple events, we see that there is autocorrelation in the Pearson residuals prior to fitting the logistic regression as well as autocorrelation in the excess residuals prior to modelling the excesses with the GPD regression (seen most strongly for Training events 1 and 2).The corresponding plots showing the benefits of modelling these autocorrelations with the logistic regression and GPD regressions are shown in Figure S2.These dynamic models reduce the residual autocorrelations substantially (the black lines generally lie much lower than the red lines).

III. SIGNAL AND ENVELOPE INDICES FOR TRAINING EVENTS AND TEST NON-EVENTS
The time series plots for the raw signal and 1-5Hz envelope indices for Training events 1 and 2, Test event and Test nonevents 1, 2 and 3 are given in Figures S3-S8.

EVENTS
There are a few ways we could determine a suitable threshold based on data from multiple events.An initial approach might be to simply combine the data across events and use all exceedances to inform the threshold.Figure S9 shows that this leads to a relatively high threshold of 95 being selected.In addition, since there were comparatively more exceedances from Event 2 than Events 1 and 3 (260 exceedances versus 14 and 3 exceedances for Events 1 and 3), the model fitted the data from Event 2 better than Events 1 and 3.This is reflected in much higher forecast probabilities near the eruption period for Event 2 than for Events 1 and 3 as seen in Figure S10.An alternative approach for choosing a suitable threshold across all training events, which we eventually use, would be to estimate thresholds for the training events separately and use the lowest estimate across events.This will ensure that we have sufficient exceedances to represent each event.
Figure S11 shows the p-values for the GPD goodness-of-fit tests for Events 1 and 2 for the 1-5Hz envelope index.The corresponding plot for Event 3 is shown in Figure 1d of the main paper.From these plots, the threshold estimates from Event 1, 2 and 3 are 89, 96 and 85 respectively.We see that the previous estimate of 95 which was obtained by combining the data across events was largely due to Event 2. With the threshold choice of 85, the lowest estimate among the events, we have 282 exceedances for Training Event 1, 1033 for Event 2 and 423 for Event 3. As shown in Section V of the main paper, one-hour ahead forecast probabilities of sensible magnitudes were obtained for all three training events, the test event and the three non-events.

REGRESSION
From Table I, we see that the three covariates with the highest absolute coefficient estimates in the logistic regression for the 1-5Hz envelope index are 0.1-20Hz cepstral kurtosis (cep kurtosis 0120), 0.1-20Hz cepstral skewness (cep skew 0120) and high pass 0.01Hz energy (energy hp001).Figures S12-S15 shows their corresponding time series for the training and test events.VII.RESULTS FOR THE 5-15HZ ENVELOPE Similar results for the dynamic extreme value model were obtained for the 5-15Hz envelope.Figure S16 shows the one-hour ahead threshold exceedance forecasts for the three training events using the lowest threshold estimated across events for the 5-15Hz envelope index.Figure S17 shows that while sensible spikes in forecast probabilities in line with the eruption activity were observed for the test event, probability estimates remained low throughout the three non-events.
AND NON-EVENTS USED TO TRAIN AND TEST THE FORECASTING MODEL.E-C REFERS TO AN ERUPTION THAT REMAINS INSIDE THE SUMMIT CRATERS (DOLOMIEU OR BORY) AND E-L REFERS TO AN ERUPTION THAT STARTS ANYWHERE ELSE OUTSIDE OF THE SUMMIT CRATERS.

Fig. 1 .
Fig. 1.(a) Seismic signal for the 1-5Hz frequency band and (b) the corresponding envelope index in decibels (dB) for the January 2010 eruption event.The dotted, dashed and bold blue vertical lines denote the start of the seismic crisis, the start and end of the seismic swarm and the eruption onset respectively.Note that the time series are deciminated such as every 5183th reading and every 402th reading are shown for the top and bottom plots respectively.(c) Model conceptual diagram for the forecasting of the exceedance of the 1-5Hz envelope index based on covariates computed from signals and indices across multiple frequency bands within past time windows.(d) p-values from the Anderson-Darling (AD) and Cramér-von Mises (CVM) tests for goodness-of-fit of the excesses to the GPD distribution (we select the lowest threshold for which the p-value exceeds the 10% significance level).
(a) Training forecasts (b) Zoomed in

Fig. 2 .
Fig. 2. (a) Training one-hour-ahead exceedance forecasts based on the logistic regression for the 1-5Hz envelope index and the January 2010 eruption; (b)Zoomed into the period of significant volcanic activity (the dotted, dashed and bold blue vertical lines denote the start and end of the seismic crisis, swarm and eruption onset respectively).Here, the value of the black line at 09:00, for example, indicates the forecasted probability of exceedance for 10:00.

Fig. 3 .
Fig. 3. (a) Improvement in goodness-of-fit when the threshold used increases from 50% (top) to 100% (bottom) of the value chosen by extreme value theory; (b) Improvement in the training performance in terms of Area under the Curve (AUC) for the 1-5Hz, 0.1-20Hz and high pass 0.01Hz index exceedance models when the threshold increases.

Fig. 4 .
Fig. 4. With multiple training events: Comparison in goodness-of-fit in terms of the empirical and theoretical quantiles of the standardised excesses when we use (a) the GPD regression for the excess distributions instead of (b) treating the excess distribution as static through a constant GPD.Since the extreme quantiles lie closer to the one-to-one diagonal line for the GPD regression, it fits the data better.
Figures S12-15 that the covariate values tend to increase during the seismic swarms.Since the seismic swarms for Training events 1, 2 and 3 occur slightly more than one hour prior to their recorded eruptions, monitoring the energy values seems

Fig. 5 .
Fig. 5. One-hour ahead threshold exceedance forecasts for the three training events using the lowest threshold estimated across events.The dotted, dashed and bold blue vertical lines denote the times of the seismic crises, start and end of the seismic swarms and the eruption onset respectively.

Fig. 6 .
Fig. 6.One-hour ahead threshold exceedance forecasts using the lowest threshold estimated across events for: (a) the test event, (b)-(d) the test non-events.The dotted, dashed and bold blue vertical lines in Plot (a) denote the times of the seismic crisis, seismic swarm and the eruption onset.
A CONSTANT GPD MODEL In Section III of the main paper, we showed how we can use the dynamic extreme model to forecast volcanic eruptions arXiv:2208.10724v1[stat.AP] 23 Aug 2022 (a) Autocorrelation of Pearson residuals (b) Autocorrelation of excess residuals

Fig. S1 .
Fig. S1.(a) Estimated autocorrelation function (ACF) for the Pearson residuals from the fitted logistic regression (black) and the constant probability model (red dotted); (b) Estimated autocorrelation function (ACF) for the residuals from the fitted GPD regression (black) and the constant GP model (red dotted).In the ACF plots, the blue dashed lines denote the 95% Barlett confidence intervals for white noise.
(a) Training event 1: ACF of Pearson residuals (b) Training event 1: ACF of excess residuals (c) Training event 2: ACF of Pearson residuals (d) Training event 2: ACF of excess residuals (e) Training event 3: ACF of Pearson residuals (f) Training event 3: ACF of excess residuals

Fig. S2 .
Fig. S2.For model fitted using all three training events and using the lowest threshold estimate: Estimated autocorrelation function (ACF) for the Pearson residuals from the fitted logistic regression (black) and the constant probability model (red dotted) in Plots (a), (c) and (e).Estimated autocorrelation function (ACF) for the residuals from the fitted GPD regression (black) and the constant GP model (red dotted) in Plots (b), (d) and (f).In the ACF plots, the blue dashed lines denote the 95% Barlett confidence intervals for white noise.

Fig. S3 .
Fig. S3.Training set 1: (a) Seismic signal for the 1-5Hz frequency band and (b) the corresponding envelope index in decibels (dB) for the November 2009 eruption event.The dotted, dashed and bold blue vertical lines denote the start of the seismic crisis, the start and end of the seismic swarm and the eruption onset respectively.Note that the time series are deciminated such as every 5184th reading and every 330th reading are shown for the top and bottom plots respectively.

Fig. S4 .
Fig. S4.Training set 2: (a) Seismic signal for the 1-5Hz frequency band and (b) the corresponding envelope index in decibels (dB) for the December 2009 eruption event.The dotted, dashed and bold blue vertical lines denote the start of the seismic crisis, the start and end of the seismic swarm and the eruption onset respectively.Note that the time series are deciminated such as every 3456th reading and every 306th reading are shown for the top and bottom plots respectively.

Fig. S5 .
Fig. S5.Test event: (a) Seismic signal for the 1-5Hz frequency band and (b) the corresponding envelope index in decibels (dB) for the October 2010 eruption event.The dotted, dashed and bold blue vertical lines denote the start of the seismic crisis, the start and end of the seismic swarm and the eruption onset respectively.Note that the time series are deciminated such as every 3456th reading and every 306th reading are shown for the top and bottom plots respectively.

Fig. S6 .
Fig. S6.Test non-event 1: (a) Seismic signal for the 1-5Hz frequency band and (b) the corresponding envelope index in decibels (dB) for November 30 2009.Note that the time series are deciminated such as every 1728th reading are shown.

Fig. S7 .
Fig. S7.Test non-event 2: (a) Seismic signal for the 1-5Hz frequency band and (b) the corresponding envelope index in decibels (dB) for December 22-23 2009.Note that the time series are deciminated such as every 3456th reading are shown.

Fig. S8 .
Fig. S8.Test non-event 3: (a) Seismic signal for the 1-5Hz frequency band and (b) the corresponding envelope index in decibels (dB) for May 7-8 2010.Note that the time series are deciminated such as every 3456th reading are shown.

Fig
Fig. S10.One-hour ahead threshold exceedance forecasts for the three training events using a threshold informed by data across events.Compared to Training event 2, the model is less able to distinguish and hence forecast eruptive activity for the other two training events because they contribute very few exceedances to inform the fit.

(a) 0. 1 -
Fig. S12.Training event 1: Time series plots of the three major covariates in the logistic regression.

(a) 0. 1 -
Fig. S13.Training event 2: Time series plots of the three major covariates in the logistic regression.

(a) 0. 1 -
Fig. S14.Training event 3: Time series plots of the three major covariates in the logistic regression.

(a) 0. 1 -
Fig. S15.Test event: Time series plots of the three major covariates in the logistic regression.
Fig. S16.5-15Hz envelope index: One-hour ahead threshold exceedance forecasts for the three training events using the lowest threshold estimated across events.The dotted, dashed and bold blue vertical lines denote the times of the seismic crises, start and end of the seismic swarms and the eruption onset respectively.

Fig. S17. 5 -
Fig. S17.5-15Hz envelope index: One-hour ahead threshold exceedance forecasts using the lowest threshold estimated across events for: (a) the test event, (b)-(d) the test non-events.The dotted, dashed and bold blue vertical lines in Plot (a) denote the times of the seismic crisis, seismic swarm and the eruption onset respectively.

TABLE I TABLE
OF PARAMETER ESTIMATES FOR THE LOGISTIC REGRESSION FOR THE THRESHOLD EXCEEDANCE OF THE 1-5HZ ENVELOPE INDEX.HERE, ROA REFERS TO THE RATE OF ATTACK, RMS REFERS TO ROOT MEAN SQUARE AND IOCE REFERS TO I OF CENTRAL ENERGY.

TABLE II TABLE
OF PARAMETER ESTIMATES FOR THE GPD REGRESSION FOR THE THRESHOLD EXCESSES OF THE 1-5HZ ENVELOPE INDEX.HERE, ROA REFERS TO THE RATE OF ATTACK, RMS REFERS TO ROOT MEAN SQUARE AND IOCE REFERS TO I OF CENTRAL ENERGY.