Introduction

The anticipated duration of future or on-going volcanic eruptions is often a topic of much concern in volcanically active areas, yet systematic studies of eruption duration are rare (Mulargia et al. 1985; Stieltjes and Moutou 1989; Simkin 1993; Sparks and Aspinall 2004; Mastin et al. 2009). Analyses of eruption durations can provide probabilistic constraints on the likely duration of future or on-going eruptions which could greatly benefit emergency response planning at times of volcanic crisis. Although much research has been conducted on forecasting the likely start of eruptions using statistical analysis of repose intervals (see Marzocchi and Bebbington (2012) for a review), the same cannot be said for duration data as a tool for forecasting the ends of eruptions. The aims of this paper are therefore to present a set of duration data and use it to illustrate a general statistical method of forecasting likely duration (independent of any other information) using Mt. Etna as a case study, chosen for its well-documented historical record.

The duration of a volcanic eruption can be defined as the period of time when fresh volcanic material is being emitted at the Earth’s surface. Here, we consider a period of continuous magma discharge as the basic building block of an eruption. However, the intensity of volcanic activity during an eruption is rarely constant. More often, discrete phases of heightened activity separated by periods of surface quiescence lasting hours, days or months can be observed (Simkin 1993; Siebert et al. 2010). The Smithsonian Institution’s Global Volcanism Program considers eruptive phases separated by periods of quiescence of less than 3 months as the same eruption, unless there are significant reasons to treat them as distinct events (Venzke et al. 2002; Siebert et al. 2010). However, the degree and duration of a quiescent pause required to warrant grouping a series of eruptive phases as one eruption, or splitting a series of eruptive phases into more than one eruption, is likely to depend on local circumstances. A similar argument applies to defining durations of repose periods.

This paper begins by critically assessing the available data on the duration of flank eruptions at Mt. Etna and presents a list of reliable eruption duration data. It goes on to describe and summarise these data using empirical survivor function plots and to assess variations in the distribution of eruption duration with time and location. The paper ends by demonstrating how survivor function statistics can be used to forecast the duration of future and on-going eruptions. Although the focus of this paper is Mt. Etna, the methods used to describe and forecast eruption durations are applicable to other volcanoes with well-documented historical activity.

Data selection

Mt. Etna background

Mt. Etna is the most active volcano in Europe, and consequently, it is one of the most widely studied and documented volcanoes in the world (Andronico and Lodato 2005). Hazard studies of Mt. Etna began in the late 1970s and early 1980s focussing on patterns in historic eruptions and predicting the location of future activity (Frazzetta and Romano 1978; Guest and Murray 1979; Duncan et al. 1981). Since then, numerous studies have built on this work by analysing catalogues of historic eruptions (Mulargia et al. 1985; Behncke and Neri 2003; Branca and Del Carlo 2004; 2005; Salvi et al. 2006; Neri et al. 2011; Smethurst et al. 2009; Passarelli et al. 2010; Proietti et al. 2011) and producing susceptibility and probabilistic hazard maps of surrounding areas (Andronico and Lodato 2005; Bisson et al. 2009; Behncke et al. 2005; Crisci et al. 2010; Harris et al. 2011; Cappello et al. 2012, 2013).

Two types of volcanic activity have been recognised in the historical records of Mt. Etna: persistent activity from summit vents and periodic activity from eruptive fissures on the volcano’s flanks (Guest and Murray 1979; Duncan et al. 1981; Acocella and Neri 2003; Behncke and Neri 2003; Branca and Del Carlo 2005; Crisci et al. 2010). Despite the typically explosive nature of summit activity, its effects are often localised to within a few hundred/thousand metres of the eruption site and therefore its threat to property and surrounding populations is confined above 1600–1800 m above sea level; consequently, only the tourist facilities are potentially exposed to the risk of lava invasion (Duncan et al. 1981; Proietti et al. 2011; Cappello et al. 2013). However, flank eruptions tend to produce lava flows that can extend for far greater distances and to lower elevations making them the greatest hazard on Mt. Etna (Duncan et al. 1981; Chester et al. 1985; Behncke and Neri 2003; Andronico and Lodato 2005; Behncke et al. 2005; Proietti et al. 2011). This greater relevance to lava flow hazard assessment, and the fact that the historical record of flank eruptions is considered reliable and nearly complete after 1600 AD (Mulargia et al. 1985; Behncke and Neri 2003; Branca and Del Carlo 2004; Behncke et al. 2005; Branca and Del Carlo 2005; Tanguy et al. 2007), whereas that of summit eruptions is only considered reliable after the late nineteenth century (Chester et al. 1985; Andronico and Lodato 2005; Branca and Del Carlo 2005; Proietti et al. 2011), led us to exclude summit activity from this analysis and focus only on flank eruptions. Mt. Etna’s flank eruptions occur from vents that are distributed unevenly across the volcano, being mostly concentrated in three rift zones and the Valle del Bove (Duncan et al. 1981; Acocella and Neri 2003; Behncke et al. 2005). Our compiled data includes information on vent location in order to investigate any relationships between eruption duration and location.

Mt. Etna eruption duration data

The dataset used here contains flank eruptions from 1300 to 2010. It is a result of a critical examination of the catalogues and descriptions of summit and flank activity compiled by Tanguy (1981), Mulargia et al. (1985), Behncke and Neri (2003), Branca and Del Carlo (2004), Behncke et al. (2005), Branca and Del Carlo (2005), Tanguy et al. (2007) and Neri et al. (2011) and, in specific cases, additional information gleaned from other sources. For this study, we are primarily interested in the duration of each flank eruption, so in those cases where flank activity occurred during a longer period of summit activity, the dates used are restricted to those of the flank component only. For example, volcanic activity began from both summit and flank vents on 18 May 1780. Summit activity continued into July (Tanguy et al. 2007), whereas the flank component of this eruption ended earlier, with reported end dates ranging from 28 to 31 May 1780 (Branca and Del Carlo 2004; Behncke et al. 2005; Branca and Del Carlo 2005; Tanguy et al. 2007). For this study, the dates of the flank activity are used and this eruption is reported as starting on 18 May and ending on 29 May 1780. In a few other cases (e.g. May 1759), the precise dates of flank activity during times of summit activity are not reported. These flank eruptions have been excluded.

Some eruptions on Mt. Etna consist of more than one eruptive phase separated by periods of quiescence ranging from hours to days. An argument could be made that each phase constitutes a separate eruption; however, because some eruptions are described in detail whereas others are more vague, it is unrealistic to assume that we have information about every quiescent period that occurred on Mt. Etna between the years 1300 and 2010. Instead, we propose that periods of quiescence of less than 10 days between eruptive phases are not sufficient enough to warrant separating an eruptive sequence into two eruptions.

Accounting for uncertainty

Uncertainties in the start and/or end dates of each eruption were considered in detail. One source of uncertainty is contradictory reporting. For example, the 1911 flank eruption is documented by Acocella and Neri (2003), Behncke and Neri (2003), Andronico and Lodato (2005), Behncke et al. (2005) and Neri et al. (2011) as starting on 10 and ending on 22 September, and these dates were chosen as the preferred start and end dates of this eruption in this study. However, Mulargia et al. (1985) reported this eruption as starting 1 day earlier (9 September). To account for this, an uncertainty in the duration of + 1 day has been assigned to the eruption’s start date. Furthermore, Tanguy (1981) and Tanguy et al. (2007) reported this eruption as ending 1 day earlier (21 September), whereas Branca and Del Carlo (2004) and Branca and Del Carlo (2005) reported it as ending 1 day later (23rd September). Here, an uncertainty in the duration of both + and − 1 day has been assigned to the eruption’s end date. This results in a preferred eruption duration of 12 days (10 to 22 September) with a maximum duration uncertainty of +2 days (9 to 23 September) and − 1 day (10 September to 21 September), thus the total duration of this eruption could range from 11 to 14 days. This method has been applied to all eruptions with contradictory start and/or end dates reported in the literature.

A second source of uncertainty arises where the start and/or end date of an eruption has been reported only to the nearest month or year. Here, a date was assigned along with a number of days uncertainty, according to the method adopted by Bebbington and Lai (1996) and Benoit and McNutt (1996) (Table 1). Sometimes, despite an eruption’s start or end only being known to the nearest month, slightly more qualitative information is provided indicating that it was ‘early,’ ‘mid’ or ‘late’ in that month. Again, the method of Benoit and McNutt (1996), summarised in Table 1, was applied.

Table 1 Table of assigned dates and uncertainties

Where all sources examined give the same start and end date for an eruption an uncertainty value is assigned based on whether the eruption is reported to the nearest day or whether hourly resolution is provided in the primary literature (Table 1).

Some eruptions carry both literature-derived uncertainties and assigned uncertainties. For example, the 1755 eruption has a preferred duration of 6 days. This duration carries a + 1 day uncertainty which is derived from differences in the reported start date. The precise times of day that the eruption started and ended are unknown and although this literature-derived uncertainty covers the potential for the eruption duration to have been slightly longer than 6 days, it does not allow for it to be slightly shorter. To account for this, a − 0.5 day uncertainty in the eruption duration is assigned according to the ‘nearest day’ category of Table 1. The maximum uncertainty in the duration for this eruption is therefore + 1 day and − 0.5 days.

Eighty known or suspected flank eruptions are reported from 1300 AD to 2010, however, three of these are excluded as their location is ambiguous and may be best described as summit eruptions (September 1869, February 1999 and July 2006). A further 11 eruptions have unknown durations (1333, August 1381, 1444, September 1446, September 1578/79, June 1607, March 1689, May 1759, 1764, July 1787, and November 1918) and four were excluded due to their duration uncertainty being greater than 50 % of their total preferred duration (November 1566, September 1682, August 1874 and December 1949). This results in 62 eruptions considered to have reliable durations (listed in Table 2) that can be used in the following analyses, 49 of these eruptions carry duration uncertainties of less than ±10 %.

Table 2 Dataset of historical Mt. Etna flank eruptions with known durations, 1300–2010

Additional information on specific eruptions

Tanguy et al. (2007) provide the most comprehensive catalogue of historical Etna eruptions extending from 1600 to 2003. The majority of the eruptions within this time period that are included in Table 2 are also reported by Tanguy et al. (2007), although sometimes, where numerous other sources give alternative dates, their dates are not used but are covered in the eruption’s assigned uncertainty. Two eruptions, however, are used here but not included by Tanguy et al. (2007). These are the February 1643 and the January 1968 eruptions (#8 and #41, Table 2). The latter eruption is documented in numerous other sources, including Tanguy (1981). Its exclusion by Tanguy et al. (2007) may have been an oversight, with other eruptions between 1966 and 1970 included in Tanguy (1981) but missing from Tanguy et al. (2007). The 1968 eruption is therefore included in our dataset using information from other sources (Table 2). The February 1643 eruption is excluded by Tanguy et al. (2007) due to some confusion in the literature between its vent location and the location of the 1646-7 lava flows (Tanguy et al. 2007); however, we include this eruption here, using the dates reported by Behncke et al. (2005) and Tanguy (1981).

Information about the dates of three other eruptions differs significantly from that recorded within the catalogue of Tanguy et al. (2007). These are the March 1956 and the February and November 1975 eruptions (#39, #45 and #46, Table 2). The flank eruption of March 1536 (#3, Table 2) was accompanied by summit activity that continued until the end of the year (Siebert et al. 2010; Tanguy et al. 2007). The flank component of this eruption is reported as ending in April (Behncke et al. 2005), whereas the information within Appendix 1 of Tanguy et al. (2007) states that the eruption ‘probably ended on 8 April.’ To account for this uncertainty, the precision to which the end date is known is considered to be in the ‘early month’ category of Table 1 so the 5 April is assigned with a ±5 day duration uncertainty (Table 2).

The two 1975 flank eruptions also occurred during a period dominated by summit activity. Such close association between the summit and flank activity makes isolating the dates of the flank component difficult and Tanguy et al. (2007) have simply recorded these eruptions within the longer summit activity. Other workers tried to resolve this, and it is the dates and uncertainty within these alternative references that are included in Table 2.

Mt. Etna vent location data

Flank eruptions at Mt. Etna are often associated with multiple aligned vents or fissures radiating from the volcano’s summit (Acocella and Neri 2003). Table 2 and Fig. 1 contain information about the location of each eruption, derived from maps by Romano et al. (1979), Chester et al. (1985), Acocella and Neri (2003) and Branca et al. (2011).

Fig. 1
figure 1

Sketch map of Mt. Etna based on (Romano et al. 1979) and (Branca et al. 2011) showing the extent of erupted material and the position of their vents or fissures (yellow stars and lines, respectively) for the eruptions within Table 2. Dashed lines represent the boundaries between sectors A, B and C (discussed in the text), VDB = Valle del Bove, VDL = Valle del Leone and VC = Val Calanna

The East flank of Mt. Etna is dominated by the large collapse feature of the Valle del Bove (Guest et al. 1984) and smaller Valle del Leone. The 19 eruptions with vents/fissures located within the Valle del Bove and the one eruption within the Valle del Leone are identified as ‘VDB’ or ‘VDL’ in the location column of Table 2; however, for the remainder of this paper, the Valle del Leone eruption (#56, Table 2) will be grouped with the Valle del Bove eruptions and referred to as such.

The April 1971 eruption (#42, Table 2) was a complex flank eruption (Tanguy et al. 2007). The activity occurred at three vents on the upper South flank and a series of vents on the East flank of the volcano within the Valle del Bove and extending onto the NE flank (Branca and Del Carlo 2004; 2005; Tanguy et al. 2007; Le Guern 1972). Despite the varying location of activity during this eruption, and its association with the early formation of the summit’s South-East crater, it is included here as one event with a duration of 68 days on the ENE flank.

The May 1879 and October 2002 eruptions (#27 and #59, Table 2) both involved more than one vent located on different flanks of the volcano. Here, the vent which was active for each eruption’s entire duration is used, although the erupted material from both vents is shown on the map in Fig. 1. Precise vent locations could not be found for two of the eruptions in Table 2 (#8 and #45); however, examination of the literature and careful location of their erupted products has given enough evidence to assign approximate locations for these eruptions, with both eruptions #8 and #45 affecting the North–North–East region of the volcano.

The completeness of the historical record

The completeness of the eruption record requires some consideration when investigating past eruptive activity. It is important to recognise that some eruptions may have gone unnoticed or unrecorded entirely and that as a result our data (Table 2) is a sample of recorded eruptions only. The recording of Mt. Etna’s eruptive activity dates back to Greek and Roman epochs (Branca and Del Carlo 2004; 2005; Tanguy et al. 2007). However, the records are often only considered to be complete after 1600 AD (Mulargia et al. 1985; Behncke and Neri 2003; Branca and Del Carlo 2004, 2005; Behncke et al. 2005; Tanguy et al. 2007; Cappello et al. 2013). Figure 2a shows an apparent increase in eruption frequency since 1300 AD which is most probably an artefact of reporting. Prior to 1600 AD, data are scarce, and eruptions are often excluded due to insufficient information regarding their duration. Following 1600 AD, the steepness of the curve increases and fewer eruptions are excluded due to the dataset becoming a more complete representation of flank activity at Mt. Etna. All flank eruptions after 1970 have accurately known durations.

Fig. 2
figure 2

a Plot of cumulative eruption number against eruption start year of all 77 flank eruptions reported between 1300 and 2010.Pale symbols represent the 15 eruptions excluded from this study due to insufficient information regarding their start and/or end date. b Plot of eruption duration (on a log scale) against start year for the 62 eruptions included in this study (Table 2). Vertical dashed lines in both plots represent the years 1600, 1670 and 1971

Figure 2b shows that this increased reporting of eruptions with time is accompanied by an increase in the number of reported eruptions with short durations. This may suggest that the early eruption record is biased towards eruptions which made the most impact on surrounding areas (Andronico and Lodato 2005). This reporting bias appears to reduce during the eighteenth century (Fig. 2b) and may reflect a shift towards more modern approaches in observing and documenting volcanic activity after the large 1669 flank eruption (Branca and Del Carlo 2004; 2005).

A regional bias in the quality and completeness of eruption records may also exist on Mt. Etna. The volcano’s Western flank appears to have experienced fewer flank eruptions than other areas of the volcano (Fig. 1). Geological maps of Mt. Etna (Romano et al. 1979; Branca et al. 2011) show more lava flows on this flank than are represented in this study; however, these are either a result of eruptions prior to 1300 AD, and therefore outside the range of this investigation, or have undocumented eruption years. Although the reduced number of eruptions, especially in recent years, from vents located on Mt. Etna’s West flank may reflect a preference for eruptive vents to open on other flanks, some of this may be a reporting bias due to the Western flank being the least populated region of Mt. Etna (Behncke et al. 2005). Similarly, 95 % of the reported eruptions within the uninhabited and poorly accessible Valle del Bove post-date 1600 AD (Table 2), which may reflect a reporting bias here too.

Data before 1600 AD may be a poor representation of Mt. Etna’s activity due to the reporting biases discussed and therefore cannot be used to make reliable forecasts about future activity. Data from before 1600 AD has therefore been excluded from the analyses in the remainder of this paper.

Statistical analysis

Survivor functions

The duration of a volcanic eruption can be considered as a type of survival time measurement. Survival analysis was first employed as a method of costing insurance premiums. It is now commonly used in medical studies to assess the length of remission following different treatments or in engineering situations to investigate the length of time before failure of an appliance or system (Machin et al. 2006). As with these types of data, eruption duration can be displayed graphically in an empirical survivor function plot, constructed by placing the observed durations (x i ) in rank order so that x 1x 2 ≤ … ≤ x N , where N is the total number of observations. The empirical survivor function ((x i )) is then plotted at duration x i , where

$$\hat{F}(x_{i}) = \frac{N-i}{N}, \qquad i=1,\ldots,N. $$
(1)

The resultant empirical survivor function curve provides information about the survival experience of that dataset. Typically these curves have an inverse ‘S’ shape with shallow gradient tails to the distribution representing rarer events with unusually long or short durations and a steeper central portion where the majority of eruption durations plot. Figure 3 shows the empirical survivor function curve for preferred eruption duration data between the years 1600 to 2010 along with curves for the maximum and minimum possible eruption durations, derived from individual eruption duration uncertainty (discussed previously and reported in Table 2). This plot demonstrates that the overall shapes and positions of the three empirical survivor function curves are very similar, implying that individual eruption duration uncertainty has a negligible effect on the overall distribution of the data.

Fig. 3
figure 3

Empirical survivor function curves for the 1600–2010 preferred eruption durations and their maximum and minimum possible eruption durations when uncertainty is taken into account (data from Table 2)

Temporal variation in eruption duration

A fundamental assumption of any investigation using historical eruption data as an insight into future activity is that the character of past eruptions is a good indicator of the volcano’s future activity (Chester et al. 1985; Behncke and Neri 2003; Behncke et al. 2005; Cappello et al. 2013). The following section considers the appropriateness of this assumption to the Mt. Etna data in Table 2.

The distribution of eruption duration between 1600 and 1669 is dominated by long duration eruptions, three of which are longer than any subsequent eruption (Fig. 2b). During this time, erupted lavas were rich in plagioclase phenocrysts and believed to have been stored in a shallow magma reservoir within the volcanic edifice prior to eruption. However, directly following the 1669 eruption Mt. Etna experienced a sharp decrease in productivity and a reduction in the phenocryst content of erupted lavas, which has been attributed to the draining of a shallow magma reservoir within the volcanic edifice during the seventeenth century (Hughes et al. 1990; Behncke and Neri 2003). It is possible that the shallow magma chamber existing at this time promoted longer duration eruptions.

After 1669 eruption durations range from 0.5 to 473 days and there has been a general increase in eruption frequency with time that is not an artefact of reporting (Behncke and Neri 2003; Behncke et al. 2005; Branca and Del Carlo 2005; Cappello et al. 2013). In particular, dramatic increases in eruption frequency and output rate have been recognised following 1971 (Andronico and Lodato 2005; Behncke et al. 2005; Branca and Del Carlo 2005; Smethurst et al. 2009; Cappello et al. 2013). A similar trend can be observed in our data (Table 2), with 20 flank eruptions in the past 38 years (1971–2010), as opposed to only 7 in the 41 years before it (1930–1971) (Fig. 2, Table 2). The increased frequency of eruptions following 1971 is accompanied by a reduction in short duration eruptions, with reported eruption durations of less than 6 days being absent after this time (Fig. 2b). Median eruption durations for these three time periods are 190 days (1600 to 1669), 24 days (1670–1971) and 50 days (1972–2010).

Figure 4 shows empirical survivor function curves for the eruption durations of these three time periods. The 1670 to 1971 and 1972 to 2010 datasets diverge at durations less than 10 days (Fig. 4). If such variation in eruption duration distribution is significant, it could indicate a change in the dynamics of the volcanic system at c. 1971 in such a way that discourages short duration eruptions, thus reducing their likelihood in the future. This implies that using the whole dataset of post-1669 eruptions would be an unrealistic representation of future activity, and that it might be more appropriate to use the 1972–2010 subset of the data. However, a Mantel–Haenszel Logrank test (Appendix A and (Machin et al. 2006)) indicates that the curves are not statistically different at the 0.05 level and it cannot be concluded that they derive from different distributions (test statistic = 2 on 1 degree of freedom). For forecasting future eruption durations on the basis of past eruptions this implies that restricting the input data to eruptions from 1972 to 2010 is currently unnecessary.

Fig. 4
figure 4

Empirical survivor function curves for eruption durations from 1600 to 1669 (n = 7), 1670–1971 (n = 31) and 1972–2010 (n = 20) (data from Table 2)

In contrast, the empirical survivor function curve for the 1600–1669 dataset is entirely offset from the 1670–1971 and 1972–2010 curves (Fig. 4) and a Mantel–Haenszel Logrank test (Appendix A and (Machin et al. 2006)) indicates that this offset is statistically significant at the 0.05 level (test statistic = 7 and 5.3 on 1 degree of freedom, respectively). This clear difference and the evidence for a different plumbing system beneath Mt. Etna prior to 1670 may indicate that a future eruption of this scale and duration is unlikely and therefore that we should only use eruptions after 1669 as the basis of any forecasting models. However, the 1600–1669 time period has previously been interpreted as the culminating phase of a century-scale cycle in eruptive activity at Mt. Etna, with the next cycle still continuing today (Behncke and Neri 2003; Tanguy et al. 2003; Cappello et al. 2013). Recent investigations into the plumbing system of Mt. Etna indicate increasing magma accumulation beneath the volcano (Behncke and Neri 2003; Patané et al. 2003; Allard et al. 2006). This, along with the trend of increasing eruption frequency and output rate, may indicate a gradual return to the style of activity that was typical in the early seventeenth century which Behncke and Neri (2003) ascribed to the ending of a century-scale cycle of activity. By excluding the 1600–1669 data, the model would be unable to account for the possibility that future activity at Mt. Etna could become more voluminous and potentially hazardous in the future. We will compare forecasting models using both the 1600–2010 and 1670–2010 datasets later.

Sectoral variation in eruption duration

Previous investigations into the location of historical flank eruptions at Mt. Etna have highlighted three regions of high vent density on the North-Eastern, Southern and Western flanks of the volcano interpreted as three rift zones where eruptions are common (Duncan et al. 1981; Chester et al. 1985; Behncke et al. 2005; Neri et al. 2011; Proietti et al. 2011). To assess whether the distribution of eruption duration varies between each rift zone, we have split the volcano into three sectors. Unlike Proietti et al. (2011), our sectors are not evenly distributed or positioned so that one boundary is directed North. Instead, we have used similar sectors to Behncke et al. (2005) whereby each sector contains one of the three identified rift zones along with any vents which appear closely associated with it. Using a point centred above the summit, these are between (A) 347 ° and 104 °, (B) 104 ° and 226 ° and (C) 226 ° and 347 ° (Fig. 1), and include the North-Eastern, Southern and Western rift zones, respectively.

The boundary between sectors A and B cuts through the Valle del Bove. Eruptions within this area are common and, since 1971, many lava flows from the summit’s South East crater enter this valley making the resurfacing rate high such that identifying vents and fissures within this area can be difficult. The precise positions of the 1955 and 1802 fissures (#13 and #19, Table 2) are unknown but reported to be close to Rocca Mussarra and are therefore considered here as part of sector A. Other fissures and vents within the Valle del Bove have been located using the sources previously discussed and assigned to sectors A or B accordingly.

The majority of eruptive vents and fissures outside of the Valle del Bove fall clearly within one of the three sectors (Fig. 1). The March 1981 eruption (#51, Table 2) was the result of a long fissure which crosses the boundary between sectors A and C. The eruption is most probably a result of the North–East rift zone and is therefore considered part of sector A (Fig. 1). Similarly, the eruptive fissure of the May 2008 eruption (#62, Table 2) crosses the boundary between sectors A and B. The lower portion of this fissure was active throughout the eruption and thus the eruption is attributed here to sector B (Fig. 1).

Empirical survivor function curves plotted for the 1600 to 2010 eruptions in sectors A, B and C are displayed in Fig. 5. The small sample size of sector C (n = 6) results in a crude empirical survivor function curve and any differences between its eruption duration distribution and that of sectors A and B is difficult to discern. The sample sizes of sectors A and B are higher (n = 23 and n = 29, respectively) and while the tails of their distributions overlap, the central portions diverge, with median durations of 18 days (sector A) and 84 days (sector B) (Fig. 5). To assess whether these differences are significant, Mantel–Haenszel Logrank tests have been performed on all possible combinations of sector pairs (i.e. A–B, A–C and B–C) and the results are summarised in Table 3. Despite the median duration of sector B (84 days) being higher than that for sectors A and C (18 and 19.5 days, respectively), the distributions cannot be considered statistically different at the 0.05 level. For sector pair A–B, a Mann–Whitney test and t test (applied to the logs of the data) were also performed, with similar results (p value results are 0.213 and 0.371, respectively). It can therefore be concluded that despite the observable differences in the central portion of the empirical survivor function curves (Fig. 5), we cannot reject the null hypothesis that there is no difference between the shapes of the eruption duration distribution of sectors A and B. This is likely to be due to the relatively small numbers of eruptions in statistical terms in each sector.

Fig. 5
figure 5

Empirical survivor function curves for eruption durations within sectors A (n = 23), B (n = 29) and C (n = 6) between the years 1600 and 2010 (data from Table 2)

Table 3 Mantel–Haenszel Logrank test results for all possible sector pairs

Forecasting the duration of future flank eruptions

Description of the statistical model

When duration data are modelled using theoretical distributions, survival analysis can be used to estimate the probability that a future eruption will exceed a given length of time. The probabilistic forecasts are based on best-fit parametric statistical models of empirical survivor functions. The two-parameter log-logistic and the three-parameter Burr type XII distributions have been considered and their survivor functions are

$$\hat{F}(x)\,_{\mathrm{(Log-logistic)}} = \frac{1}{1+(x/\sigma)^{\beta}} $$
(2)
$$\hat{F}(x)\,_{\mathrm{(Burr\,XII)}} = \frac{1}{\{1+(x/\sigma)^{\beta}\}^{\alpha/\beta}} $$
(3)

To identify the best-fit log-logistic and Burr type XII survivor functions, their parameters (α, β and σ) have been found by maximum likelihood estimation and their goodness of fit to the observed duration data tested using a Kolmogorov–Smirnov test. If the Kolmogorov–Smirnov test results indicate that the observed duration data could have been derived from either distribution, a likelihood ratio chi-squared test is used to assess whether there is any benefit in employing the more complicated Burr type XII distribution or whether the simpler log-logistic distribution provides an equally good fit to the data. Additional information on these methods can be found in Appendix B.

The best-fit survivor function can be used to make probabilistic forecasts about the duration of future and on-going volcanic eruptions. Three types of forecast are made in this investigation. The first is the probability of exceeding a specified duration x according to the survivor function given in Eq. 2 or 3. The second is a variation on the survivor function, adapted for on-going eruptions, wherein the residual life function is used to find the probability of exceeding a specified total duration x, having already reached duration t and is given by

$$\hat{F}_{t}(x)\,_{\mathrm{(Log-logistic)}} = \frac{\sigma^{\beta}+t^{\beta}}{\sigma^{\beta}+x^{\beta}} $$
(4)
$$\hat{F}_{t}(x)\,_{\mathrm{(Burr\,XII)}} = \left(\frac{\sigma^{\beta} + t^{\beta}}{\sigma^{\beta} + x^{\beta}} \right)^{\alpha/\beta} $$
(5)

Finally, the quantile function given by

$$x_{p}\,_{\mathrm{(Log-logistic)}} = \sigma\left( \frac{p}{1-p} \right)^{1/\beta} $$
(6)
$$x_{p}\,_{\mathrm{(Burr\,XII)}} = \sigma \left\{\frac{1}{(1-p)^{\beta/\alpha}} -1 \right\}^{1/\beta} $$
(7)

enables the user to find the duration associated with a stated quantile p, that is, the duration that has probability 1 − p of being exceeded. For each forecast, the 95 and 80 % confidence intervals have been calculated using the methods given in Appendix C.

Application of the model to Mt. Etna

The above investigations have shown that differences in the distribution of eruption duration before and after 1971 and differences in the distribution of eruption duration on different sectors of Mt. Etna’s flanks are not statistically significant at the 0.05 level. This indicates that the eruption durations recorded between 1670 and 2010 could have all derived from the same distribution, and therefore it is acceptable to use this data in the forecasting model presented below. We have also demonstrated that the distribution of eruption duration between 1600 and 1669 is dominated by long duration eruptions which may have been the result of a shallow magma reservoir existing beneath Mt. Etna at this time. A gradual return to this type of activity in the future has been proposed by Behncke and Neri (2003) so we have made eruption duration forecasts on two different datasets: 1600–2010 and 1670–2010.The 1600–2010 dataset allows us to account for the very long eruption durations that may occur in the future if a shallow magma reservoir were to be re-established. It contains a total of 58 observed eruption durations ranging from less than 1 day to 3,653 days with a median duration of 34.5 days (Table 2). The 1670–2010 dataset may give a more realistic forecast of eruption durations in the near future. This dataset contains 51 observed eruption durations ranging from less than 1 day to 473 days with a median duration of 26 days (Table 2).

For both the 1600–2010 and 1670–2010 datasets, the Kolmogorov–Smirnov goodness of fit test suggests that the observed durations could have been derived from either a log-logistic or Burr type XII distribution. Additional chi-squared tests indicate that there is no benefit in applying the Burr type XII distribution over the log-logistic distribution. The best fit log-logistic survivor functions have estimated parameter values of 0.94 and 40.56 (1600–2010) and 1.00 and 33.00 (1670–2010) for β and σ, respectively. The resultant survivor function curves are displayed graphically alongside their empirical survivor curves (Emp_SF) in Fig. 6.

Fig. 6
figure 6

Empirical survivor function (Emp_SF) curves along with their best-fit log-logistic survivor function curves for historical flank eruption durations at Mt. Etna (data from Table 2) from a 1600–2010 (β = 0.94, σ = 40.56) and b 1670–2010 (β = 1.00, σ = 33.00)

Table 4 contains the results of seven forecasts made from the 1600–2010 and 1670–2010 datasets; three using the survivor function (a and b in Table 4), two using the residual life function where t is 14 days (c and d in Table 4) and two using the quantile function (e and f in Table 4). The values displayed in the first column of each table represent the scenario being forecast, e.g. the probability of an eruption exceeding 7 days or the duration associated with a p value of 0.34. The final two columns in each table represent the 95 and 80 % confidence intervals that have been calculated. When discussed in the text, 80 % confidence intervals are quoted.

Table 4 Forecast results for the 1600–2010 and 1670–2010 datasets

The shape and position of the two empirical survivor function curves in Fig. 6 are similar. The greatest difference is the prominent long duration tail of the empirical survivor function curve in Fig. 6a (1600–2010) which is absent in Fig. 6b (1670–2010). This is a result of the long duration eruptions which occurred between 1600 and 1669. The effect of this on the forecasting model results is that the probability of exceeding a given duration is consistently lower for the 1670–2010 dataset than the 1600–2010 dataset and that this difference is slightly greater when forecasting longer duration eruptions (Table 4). For example, when the 1600–2010 dataset is considered, results show an 84 % (± 5 %) probability of exceeding 1 week (7 days) and a 57 % (± 7 %) probability of exceeding 1 month (30 days). These probabilities are reduced to 82 and 52 % when the 1670–2010 dataset is considered (a and b in Table 4). A similar trend is also present in the results of the residual life function (c and d in Table 4).

The survivor function and residual life function both give the probability of exceeding stated durations. Perhaps more useful is the quantile function, allowing the user to identify durations associated with specific probabilities. Furthermore, the assignment of qualitative terms such as ‘likely’ and ‘unlikely’ to sensible probabilities make the model results accessible to a wider audience. Here, we consider a ‘likely’ result as having a probability of 66 % or more, and an ‘unlikely’ result as having a probability of 33 % or less (following the approach taken in communicating climate change scenarios; (Budescu et al. 2009; Mastrandrea et al. 2010)). These equate to values of p of 0.34 and 0.67, respectively. The results of such forecasts are shown in e and f of Table 4. Using the 1600–2010 dataset results show a 66 % probability of exceeding 20 days (± 7 days) and a 33 % probability of exceeding 86 days (± 29 days) (e in Table 4), therefore it can be concluded that a future flank eruption on Mt. Etna is likely to exceed 20 days but unlikely to exceed 86 days. When the dataset is restricted to eruptions since 1669, these durations are reduced to 17 days (± 6 days) and 67 days (± 22 days), respectively (f in Table 4).

Conclusions

We have introduced a probabilistic model for forecasting the duration of future and on-going eruptions using a new dataset of historical flank eruption durations from Mt. Etna. The model shows great potential for future use as a forecasting tool and could greatly benefit emergency response planning both prior to and during volcanic crises. It is not specific to Mt. Etna and can easily be adapted for use on other highly active, well-documented volcanoes or for different duration data such as the duration of explosive episodes or the duration of repose periods between eruptions. The model uses datasets of historical eruption durations and thus relies on past eruptions being a good indicator of future activity. It is therefore limited to use on volcanoes with well-documented historic eruptions and data must firstly be assessed for reporting biases and any changes in eruption duration with time or location.

Critical assessment of documented flank eruptions from Mt. Etna resulted in a reliable dataset of reported eruption durations between the years 1600 and 2010 containing 58 eruptions with reported durations ranging from less than 1 day to 3,653 days. Eruptions between the years 1600 and 1669 include the three longest duration flank eruptions reported at Mt. Etna. As a result, this time period is statistically different from that following it. Although usually this would be the cause to exclude this data, a return to eruptions of this scale and duration in the future is conceivable. Other temporal variations in eruption duration were assessed but not found to be statistically significant. Furthermore, significant differences in the distribution of eruption duration from the prevailing three rift zones on Mt. Etna (NE, S and W) were also not found. However, there are indications of possible differences between NE and S sectors that future data and/or other information might strengthen.

We chose to run the forecasting model on two datasets: 1600–2010 and 1670–2010, allowing us to assess the effect of including the longer duration 1600–1669 eruptions. Results indicate that the probability of exceeding a given duration is consistently less for the 1670–2010 dataset; however, the degree to which this is the case is slight, especially where short durations are involved. When using the 1600–2010 dataset of historical flank eruption durations and by assigning the terms ‘likely’ and ‘unlikely’ to probabilities of 66 % or more and 34 % or less, respectively, the forecasting model was used to indicate that a future flank eruption on Mt. Etna would be likely to exceed 20 days (± 7 days) and unlikely to exceed 86 days (± 29 days).