Background

Databases of large magnitude volcanic eruptions

Databases of large magnitude volcanic eruptions have been created to provide basic information about explosive volcanism (e.g. Machida and Arai, 2003; Committee for catalog of Quaternary volcanoes in Japan 2000; Siebert and Simkin, 2002; Siebert et al., 2010) and to facilitate assessment of hazards and risks from volcanic eruptions (e.g. Mason et al., 2004; Crosweller et al., 2012). Among these databases, the most accessible global databases of volcanic eruptions are the Smithsonian’s Global Volcanism Program database (Siebert and Simkin, 2002) and the Large Magnitude Explosive Volcanic Eruptions (LaMEVE) database (Crosweller et al., 2012). The Smithsonian’s Global Volcanism Program has compiled documentation of Holocene volcanism around the world for nearly five decades for better understanding of the full range of Earth’s eruptive activity, and to make these data available (on-line at http://www.volcano.si.edu/) to the ever-broadening community interested in volcanism (Siebert and Simkin, 2002). The database contains information about vent location, start and end dates of eruptions, dating method and Volcanic Explosivity Index (VEI; Newhall and Self 1982) of any magnitude of eruption known to have occurred during the past 10,000 years (Siebert and Simkin, 2002). The LaMEVE Quaternary database includes much older events (back to ~2.6 Ma; Crosweller et al., 2012). To support assessments of environmental and societal impacts of volcanism on global, regional and local scales and to provide basic information on global explosive volcanism, the LaMEVE database was created as a component of the Volcanic Global Risk Identification and Analysis Project (VOGRIPA) database, which is being developed as part of the Global Volcano Model (GVM) (Crosweller et al., 2012). The LaMEVE database aims to include all known explosive eruptions for which events are dated, source volcanoes are known and eruption magnitude (M = log 10 [erupted mass(kg)] - 7; Pyle 2000) and/or VEI are ≥ 4. The database contains information about the age of eruptions, deposit type, deposit volumes, VEI, eruption magnitude, intensity, basic geochemistry, source volcano location, data sources, errors and uncertainties with indices of data reliability (Crosweller et al., 2012). This database is publically available online (Crosweller et al., 2012; http://www.bgs.ac.uk/vogripa) and currently contains information on 3,107 Quaternary volcanoes and 1,887 explosive eruption records from the last 2.6 My.

Under-recording of events

Creation of databases of large magnitude explosive volcanic eruptions has clarified that under-recording of the frequency of eruptions and variable recording over time are problems (Deligne et al., 2010; Brown et al., 2014). Potential for under-recording directly impacts hazard and risk assessments (Bebbington 2013). A major challenge is to understand the degree of under-recording (Guttorp and Thompson, 1991; Deligne et al., 2010; Furlan, 2010; Brown et al., 2014).

Guttorp and Thompson (1991) studied patterns of volcanic activity during the last 500 years using the Smithsonian’s Global Volcanism Program database. Their results show that neither the global nor Japanese eruption records are significantly different from a Poisson process during this time interval. They estimated the reporting probability by smoothing the observed number of events with fitting a linear line to a locally chosen number of events, decided by cross-validation. Deligne et al. (2010) compiled a global dataset of Holocene large explosive volcanic eruptions from the Smithsonian’s Global Volcanism Program database and additional literature. They analyzed the data with the extreme value method, first applied in volcanology by Coles and Sparks (2006). In this method, under-recording of events is taken into account as a function of eruption magnitude. Their results indicate that the level of under-recording is high and fairly constant from the start of the Holocene until about 1 A.D. and then decreases dramatically over the last ~2000 years (Deligne et al., 2010). Furlan (2010) analyzed a catalogue of large eruptions that occurred during the last two millennia. To represent the temporal evolution of the censoring effect in the recording process effectively, the extreme value method was used with a change-point model. Like Coles and Sparks (2006), Furlan (2010) concluded that under-recording decreases dramatically in the most recent 400 years with increased recording of eruptions spreading throughout the South Pacific in this period. Under-reporting is significantly more pronounced during Holocene and historical time when data from small-volume eruptions (VEI 0-3) are also considered (Siebert et al., 2010). Brown et al. (2014) analyzed LaMEVE to show that under-recording is a function of magnitude and the proportion of historic and geological data in each magnitude subset. The time at which 50 % of the data are younger is a power law function of magnitude, which was attributed largely to preservation potential of deposits.

Although these analyses focused on the underrecording in different global databases, under-recording varies between regions due to variations in the length of the historic record, preservation of deposits, and number of geological investigations. For example, in the LaMEVE database, Japanese events account for about 39 % (729 out of 1,887 eruptions) of the entire set of eruptive events, whereas Japanese volcanoes only account for about 3.4 % (106 out of 3,107) of the number of volcanoes in the database (Brown et al., 2014).

In this paper we investigate under-recording of a large magnitude explosive eruption database in Japan in order to quantify under-recording of eruptions in regions where a comparatively large amount of geological data are available. We use a threshold of VEI or magnitude 4 to analyze under-reporting effects of eruptions with magnitudes large enough to have been documented both in historical and geologic records.

Methods

Data on Japanese explosive eruptions were compiled from the Smithsonian’s Global Volcanism Program database (Siebert and Simkin, 2002), additional Japanese databases and published articles. These data were used to populate LaMEVE. These additional Japanese databases are Machida and Arai (2003), Committee for Catalog of Quaternary volcanoes in Japan (2000), Geological Survey of Japan, AIST (2013) and Hayakawa (2010). Machida and Arai (2003) compiled tephra data into a catalog of tephra in and around Japan to provide basic information for identification and further investigations of regional tephra fallout deposits. As this database is focused on identification of regional tephra layers, it contains major mineral components, type and refractive index of volcanic glasses and type locality of tephra. Products of recent small eruptions are not described in detail. Committee for catalog of Quaternary volcanoes in Japan (2000) edited a Quaternary volcano catalog to understand temporal and spatial variations in magmatism in Japan. Like the database of Machida and Arai (2003), this database does not contain younger and smaller eruptions. In contrast to these databases, Geological Survey of Japan, AIST (2013) compiled published articles into an online 10,000-year eruption database of Japan. Compared to the databases of Machida and Arai (2003) and Committee for catalog of Quaternary volcanoes in Japan (2000), this database provides more details of younger and smaller eruptions, although information related to some major volcanoes is still under compilation. Moreover, Hayakawa (2010) organized online databases of 2000-year and one million-year tephra in Japan. The database contains more instances of tephra fallout deposits than other Japanese databases, but these additional units are often not described in terms of eruption age and volume.

For our analysis, we identified 700 explosive volcanic eruptions at 100 volcanoes in Japan from the LaMEVE database (Crosweller et al., 2012, downloaded August 11th 2014). The LaMEVE database is developing continuously. It not only collects newly reported events and volcanoes but also removes events and volcanoes when they are determined to be inaccurate or duplicates. The analyzed version of LaMEVE originally contained 729 eruptions from 106 Japanese volcanoes. A total of 29 events and 6 volcanoes were removed because their exact eruption sources were determined to be uncertain based on their original data source (Hayakawa, 2010). Since the analyzed LaMEVE dataset contains 1887 explosive eruptions and 3107 Quaternary volcanoes throughout the world, these 700 Japanese explosive eruptions and 100 Japanese volcanoes account for about 38 % (=100 × (729–29)/(1887–29)) and 3.2 % (=100 × (106–6)/(3107–6)) of the global records, respectively. Of these eruptions, two are older than 1.8 Ma and the oldest event is 2.25 Ma. A total of 686 eruptions are larger than or equal to VEI 4 (Table 1). Although the other 14 events are larger than or equal to eruption magnitude 4 (as defined by Pyle, 2000), their VEI value is 3. Similarly, a total of 699 eruptions are larger than or equal to eruption magnitude 4 (Table 2); eruption magnitude of the remaining one VEI 4 eruption is 3.7. Smaller eruptions account for the majority of the compiled data: VEI 4 and 5 eruptions account for 82.5 % of the total 686 eruptions greater than or equal to VEI 4 (Table 1) and 74.7 % of the total 699 eruptions (M ≥ 4) are smaller than magnitude 5.5 (Table 2). Brown et al. (2014) stated that analysis of the LaMEVE database at a resolution higher than one order of magnitude bins is not justified due to the large uncertainties in tephra volume estimation and, therefore, analyzed the data in bins of order of magnitude (M4-4.9, M5-5.9, M6-6.9, M7-7.9 and M8-8.9). In our analysis, we analyze the data every half an order of magnitude, because the influence of the uncertainty is less significant as the data are binned cumulatively, so that each bin contains eruptions larger than or equal to its eruption magnitude value (Table 2). Plots of cumulative number of events against time (Fig. 1a and b) are strongly non-linear with the event rate (the derivative of this relationship) - decreasing back in time. Event rate decreases more rapidly as VEI or magnitude increases (Fig. 1c and d). Therefore, it is clear that eruptions are not constantly recorded over long time periods, a feature also observed in global data (Brown et al., 2014).

Table 1 Number and percentage of the analyzed Japanese events for each Volcanic Explosivity Index (VEI) category
Table 2 Number and percentage of the analyzed Japanese events for each eruption magnitude category
Fig. 1
figure 1

Temporal distribution of recorded events. The records of smaller eruptions disappear very rapidly with time. a Cumulative number of events of each VEI categories with time. b Cumulative number of events of each eruption magnitude categories with time. c Survivor function (normalized cumulative number of events) of each VEI categories. d Survivor function (normalized cumulative number of events) of each eruption magnitude categories

Except the category of VEI 8 due to a very small number of events (9 events), the number of missing events given the apparent change in frequency with time is estimated in three steps: (1) calculation of time dependent recording rate of events for each VEI (4 to 7) and eruption magnitude (4.0 to 7.0) category, (2) fitting a function to the time series of the observed recording rate, (3) calculation of the percentage of missing events compared to the total number of events, with both estimated from the fitted function. The modeling assumes that there are no missing events at time zero (the present day) under the present global volcano monitoring systems and the present day recurrence rate (the eruption frequency; events/ka) is calculated as the intercept of the fitted function at time zero.

Recording rate calculation

For each VEI and eruption magnitude category (Tables 1 and 2), event recording rate, r dm, at the m th youngest event age, d m, is calculated using a moving average:

$$ {r}_{dm}=\frac{2n}{d_{m+n}-{d}_{m-n}} $$
(1)

where d m+n and d m-n represent the ages of the n th older and younger events than the m th youngest event, respectively, and 2n is the number of events representing a window size, MA, of the moving average calculation. Because the smoothness of the time series of the observed recording rate depends on the window size, multiple calculations with different window sizes are necessary. Although a smaller window size (e.g. MA = 2) is suitable to detect short term recording rate changes, it often gives a zero denominator in equation 1 because some consecutive events have the same age in the database. We therefore chose the window size (MA) of 4 (n = 2), 6 (n = 3) and 8 (n = 4) in this study. This is a simpler approach than the one adopted by Sanchez and Shcherbakov (2012), in which they used logarithmically increasing bin lengths to calculate normalized recording rate.

Function fitting to the observed recording rate

The time series of the calculated recording rate was modeled with two functions. The recording rate of the larger events appears to decrease exponentially as a function of eruption age (Fig. 2a, b and e). Therefore, an exponential decay function, R e (t), is used to model the trends.

Fig. 2
figure 2

Recording rate of different VEI categories: a VEI 7. b VEI 6. c VEI 5. d VEI 4. e M ≥ 7.0. f M ≥ 6.5. g M ≥ 6.0. h M ≥ 5.5. i M ≥ 5.0. j M ≥ 4.5. k M ≥ 4.0. Blue, green and red dots show the recording rate calculated with equation 1 with the window size of 8, 6 and 4, respectively. Solid lines are fitted exponential (a, b, e and f) and double exponential (c, d, g, h, i, j and k) functions (Table 3). The red solid line of (f) (M ≥ 6.5, MA = 4) and the blue solid line of (g) (M ≥ 6.0, MA = 8) are calculated as the average of both exponential and double-exponential functions weighted with their relative likelihoods (Table 3) because their AICc or AIC difference of the two models is less than 2. The recording rate of the datasets of M ≥ 4.0 (MA = 4, 6), M ≥ 4.5 (MA = 4) and M ≥ 5.0 (MA = 4) were not calculated because some of their consecutive events had the same ages and the denominator of equation 1 becomes zero

$$ {R}_e(t)={R}_0 \exp \left(-\lambda t\right) $$
(2)
$$ =r\lambda \exp \left(-\lambda t\right) $$
(3)
$$ =r{f}_e(t) $$
(4)

where R 0 = rλ is the intercept value of the recording rate at time zero, λ is the decay constant of the exponential decay function, f e (t) is the probability density function of an exponential distribution and r is a scaling factor to fit f e (t) to the time series of the observed recording rate. Parameter λ is obtained from maximization of the logarithmic likelihood function of λ:

$$ \mathrm{l}\mathrm{n}L\left(\uplambda \right)=N \ln \left(\uplambda \right)\ \hbox{-} \uplambda {\displaystyle \sum }{d}_m $$
(5)

where N is the number of observed recording rate in the time series. From this function, the maximum likelihood estimate of λ is obtained explicitly:

$$ \uplambda = \frac{1}{\widehat{d_m}} $$
(6)

where \( \widehat{d_m} \) is the mean age of the observed recording rate r dm . For the maximum likelihood estimate of λ,

$$ T = \frac{ \ln (2)}{\uplambda} $$
(7)

represents the half-life period of recording rate R(t).

Time series of smaller events show a more rapid drop followed by a slower decreasing trend (Fig. 2c, d and h-k). These two trends are difficult to be modeled with a single exponential function. In these cases, a double exponential decay function, R d (t), is fitted to the time series of recording rate to capture the two decreasing trends observed in the time series (Fig. 2c, d and h-k):

$$ {R}_d(t) = {R}_1 \exp \left(\hbox{-} {\uplambda}_1t\right) + {R}_2 \exp \left(\hbox{-} {\uplambda}_2t\right) $$
(8)
$$ = r\left\{p{\uplambda}_1 \exp \left(\hbox{-} {\uplambda}_1t\right) + \left(1\hbox{-} p\right){\uplambda}_2 \exp \left(\hbox{-} {\uplambda}_2t\right)\right\} $$
(9)
$$ = r\left\{{pf}_{e1}(t) + \left(1\hbox{-} p\right){f}_{e2}(t)\right\} $$
(10)
$$ =r{f}_d(t) $$
(11)

The two terms of the double exponential decay function, R d (t), are attributed to rapid and slow decay of under-recording processes. R 1 is the initial value of the recording rate of units, which disappear relatively quickly with time, R 2 is the initial value of the recording rate of units which are preserved for a longer time. The parameters λ 1 and λ 2 are decay constants of the two exponential decay segments; f e1 (t) and f e2 (t) are the probability density functions of two exponential distributions; f d (t) is the probability density function of a double exponential distribution; p and r are weighting and scaling factors, respectively. Parameters λ 1 , λ 2 and p can be estimated numerically from maximization of the logarithmic likelihood function of λ 1 , λ 2 and p:

$$ \mathrm{l}\mathrm{n}L\left({\uplambda}_1,\ {\uplambda}_2,p\right) = \ln \left[{\displaystyle \prod}\left\{p{\uplambda}_1 \exp \left(\hbox{-} {\uplambda}_1t\right) + \left(1\hbox{-} p\right){\uplambda}_2 \exp \left(\hbox{-} {\uplambda}_2t\right)\right\}\right] $$
(12)

For the estimated λ 1 and λ 2,

$$ {T}_1 = \frac{ \ln (2)}{\uplambda_1} $$
(13)
$$ {T}_2 = \frac{ \ln (2)}{\uplambda_2} $$
(14)

represent the half-life periods of rapid and slow decreasing processes, respectively.

The functions R e (t) and R d (t) are obtained by estimating the scale factor r after deciding the probability density functions, f e (t) and f d (t) with the decay constants, λ, λ 1 and λ 2 and the weighting factor p. The scale factor r is estimated by minimizing the square error between the probability density function and the time series of observed recording rate in logarithmic scale (Fig. 2a-k).

The critical assumption here is that the recurrence rate of explosive eruptions in Japan was constant during the last 2 Ma. This assumption is plausible because the tectonic setting of arc volcanism has not changed significantly in Japan during most of the Quaternary (e.g., Taira, 2001; Mahony et al., 2012).

Although other functions were also investigated to describe under-recording of events (Fig. 3), they are not suitable for our analysis. For example, Coles and Sparks (2006) and Deligne et al. (2010) introduced a function into the extreme value method to represent the probability of under-recording. Their under-recording probability function may represent the short term under-recording (~2,000 years, Coles and Sparks 2006; ~10,000 years, Deligne et al., 2010). However, we found that their under-recording probability function does not fit the long term dataset (Fig. 3). In addition, the Weibull distribution is often used to model the survivor rate (e.g. Pyke and Thompson, 1986); however, the divergence of the function at t = 0 does not provide an estimate of the current frequency of eruptions. The large recording rate calculated by the function at t ~ 0 (e.g. Fig. 3d, j and k) suggests the divergence of the function. Although the modified Omori’s law (Utsu, 1961) for aftershock decay in seismology shows better fitting than these two functions (Fig. 3), an advantage of our model is the clear physical meanings of parameters in an exponential decay model, specifically the initial value R 0 (present day recurrence rate) and the half-life T of the recording rate. Parameters of the double exponential decay model correspond to the initial values R 1 , R 2 and the half-life T 1 , T 2 of recording rates of the rapid and the more slowly decreasing processes. In the latter case, the sum of the initial values of the recording rates of the rapid and slower processes represents the present day recurrence rate R 0 (R 1 + R 2 ) of events (Table 3). On the other hand, the window size 2n (MA) for the moving average calculation (equation 1) and function are not uniquely constrained by our method. Therefore, multiple calculations with different window sizes and evaluation of models with corrected Akaike’s information criterion are necessary to compare the models.

Fig. 3
figure 3

Fitting of different functions: a VEI 7. b VEI 6. c VEI 5. d VEI 4. e M ≥ 7.0. f M ≥ 6.5. g M ≥ 6.0. h M ≥ 5.5. i M ≥ 5.0. j M ≥ 4.5. k M ≥ 4.0. Light blue dots show the observed recording rate calculated with equation 1 with the window size of 8. Blue lines are fitted exponential and double exponential functions (Table 3). Orange dashed lines are the under-recording probability function (Coles and Sparks 2006; Deligne et al., 2010). Red dashed lines are the modified Omori’s law. Green dashed lines are the Weibull distribution. The under-recording probability function, the modified Omori’s law and the Weibull distribution are fitted by minimizing the square error between the functions and the time series of observed recording rate (light blue dots) in logarithmic scale

Table 3 Results of model fitting to the time series of recording rate. MA: window size, 2n, of the moving average calculation (equation 1). M: eruption magnitude. Model: selected model function. Rejected models are also shown when AIC or AICc difference between the accepted and the rejected models is less than 2 (Burnham and Anderson, 2002). Dexp: double exponential decay function. Exp: exponential decay function. Lr: relative likelihood. Note that the relative likelihood of the smaller AIC or AICc model is equal to 1. Values in the parentheses show the range of 95 percentile confidence interval estimated by the bootstrap method (Efron 1979) with 10,000 replicates. The recording rate of the datasets of M ≥ 4.0 (MA = 4, 6), M ≥ 4.5 (MA = 4) and M ≥ 5.0 (MA = 4) were not calculated because some of their consecutive events had the same ages and the denominator of equation 1 becomes zero

For each VEI (4 to 7) and eruption magnitude (4.0 to 7.0) category, one of the two probability density functions, f e (t) and f d (t), is selected as a better function based on Akaike’s information criterion (AIC; Akaike, 1974):

$$ \mathrm{A}\mathrm{I}\mathrm{C}=\mathit{\hbox{-}}2\mathrm{l}\mathrm{n}L+ 2k $$
(15)

and corrected Akaike’s information criterion (AICc; Sugiura, 1978) for small sample size (here, sample size ≤ 50):

$$ \mathrm{AI}\mathrm{C}\mathrm{c} = \mathrm{A}\mathrm{I}\mathrm{C}+\frac{2k\left(k+1\right)}{N-k-1} $$
(16)

where k is the number of parameters in the model, N is the number of recording rate observations in the time series and lnL is the maximized value of the logarithmic likelihood function (equations 5 and 12). AICc was used for the model selection of eruption categories VEI 7 (29 events), M ≥ 6.5 (50 events) and M ≥ 7 (34 events). When the difference of the AIC or AICc values of the two models was larger than 2, the model with the smaller AIC or AICc was selected (Burnham and Anderson, 2002). For our analyses, this difference corresponds to a significance level of 4.6 % (Hudson, 1971; Zhuang et al., 2005). On the other hand, if the difference of the AIC or AICc values was less than 2, results of both models are shown in Table 3, as the models are not regarded as significantly different.

Calculation of the percentage of missing events

Based on the fitted function, R(t), the percentage of under-recording in T years is estimated from the number of recorded events, \( {\displaystyle {\int}_0^TR(t)}dt \), and the total number of events, TR(0), both estimated from the fitted function:

$$ Percentage\ of\ under\hbox{-} recording\ at\ time\ period\kern0.5em T = 100\kern0.5em \times \kern0.5em \left(1\hbox{-} \frac{{\displaystyle {\int}_0^TR(t)dt}}{TR(0)}\right) $$
(17)

When the AIC or AICc difference of the two models is less than 2, the under-recording was calculated as the weighted average of under-recordings, which are estimated from both exponential and double-exponential models and weighted with their relative likelihoods (Table 3), given by:

$$ {L}_r= \exp \left(\frac{A_s-{A}_l}{2}\right) $$
(18)

where A s and A l are the smaller and larger AIC or AICc values, respectively (Burnham and Anderson 2002). Note that the relative likelihood of the smaller AIC or AICc model is equal to 1 in this equation (Table 3).

Results

The results of model fitting, AIC and AICc model selection suggest that the time series of the recording rate of smaller and larger eruptions can be described by double exponential decay model and exponential decay model, respectively (Fig. 2, Table 3). Good fitting of the functions to the observed recording rates is indicated visually by the small (less than one log unit) residuals (Fig. 4). Eruptions are not constantly recorded over long time periods (Fig. 1a and c). For instance, the VEI 4 recording rate decreases very rapidly with time (Fig. 2d, half-life is about 5600 years; Table 3) and only 4 of 293 VEI 4 eruptions are older than or equal to 1 Ma (Fig. 1a). The recording rate of smaller events, therefore, shows a rapid decreasing trend in the comparatively recent part of the record, and a slight decreasing trend in the older part of the record, suitable to be modeled by the double exponential decay function (Fig. 2c, d and h-k). For example, originally high initial recording rates (recurrence rate) of about 22–25 events/ka (VEI 4; MA = 4, 6, 8), 3.2–3.5 events/ka (VEI 5; MA = 4, 6, 8), 26 events/ka (M ≥ 4; MA = 8) and 3.4–3.5 events/ka (M ≥ 5; MA = 6, 8) decreases very rapidly with the half-lives of the recording rate of 5.56–5.63 ka (VEI 4; MA = 4, 6, 8), 39.2–40.2 ka (VEI 5; MA = 4, 6, 8), 7.69 ka (M ≥ 4; MA = 8) and 53.0–53.3 ka (M ≥ 5; MA = 6, 8) (Table 3). At older times, the data are characterized by a much less rapid decrease in the recording rate, indicating a different, slower process. Here the half-lives of the recording rate of 101–112 ka (VEI 4; MA = 4, 6, 8), 181–202 ka (VEI 5; MA = 4, 6, 8), 163 ka (M ≥ 4; MA = 8) and 265–274 ka (M ≥ 5; MA = 6, 8) are found (Table 3) for time periods prior to 26–27 ka (VEI 4; MA = 4, 6, 8), 114–130 ka (VEI 5; MA = 4, 6, 8), 28 ka (M ≥ 4; MA = 8) and 167–173 ka (M ≥ 5; MA = 6, 8), respectively. The recording rates of VEI 4 (MA = 4, 6, 8), VEI 5 (MA = 4, 6, 8), M ≥ 4 (MA = 8) and M ≥ 5 (MA = 6, 8) at this transition are about 1.7 events/ka, 0.6–0.7 events/ka, 3.9 events/ka and 0.6 events/ka, respectively.

Fig. 4
figure 4

Residuals between the observed recording rate and fitted functions. Parameters of the fitted functions are shown in Table 3. The recording rate of the datasets of M ≥ 4.0 (MA = 4, 6), M ≥ 4.5 (MA = 4) and M ≥ 5.0 (MA = 4) were not calculated because some of their consecutive events had the same ages and the denominator of equation 1 becomes zero

The recording rate of larger eruptions can be reasonably modeled with a single exponential function (Fig. 2a, b and e). For instance, the initial recording rates (recurrence rate) of VEI 6 (about 0.4 events/ka; MA = 4, 6, 8), VEI 7 (about 0.06 events/ka; MA = 4, 6, 8) and M ≥ 7 (about 0.06–0.07 events/ka; MA = 4, 6, 8) eruption categories decrease with the half-lives of 137–150 ka, 317–343 ka and 361–389 ka, respectively (Table 3). On the other hand, AIC or AICc difference between the fitted models is less than 2 in the cases of the M ≥ 6 dataset with MA = 8 and M ≥ 6.5 dataset with MA = 4 (Table 3) suggesting that these medium eruption magnitude categories might be represented by either the exponential or double exponential model (Burnham and Anderson, 2002). This transitional feature is not found in the analysis of VEI datasets but in the modeling of the eruption magnitude dataset, probably because the datasets of eruption magnitude are taken at smaller intervals (every half an order of magnitude) and also contain different magnitude eruptions, which are larger than each magnitude category. Furthermore, this transitional feature also indicates the importance of testing different window sizes in the moving average calculation in our analysis.

Figure 5 shows the increase of under-recording with time. Although the percentage is estimated with different window sizes (MA = 4, 6, 8) in the moving average calculation, it does not vary significantly, except in the case of the VEI 5 dataset. The events of smaller VEI (4 and 5) show an increase of under-recording with increasing time in the past. For example, 89 % of VEI 4 events (MA = 4, 6, 8) and 65–66 % of VEI 5 events (MA = 4, 6, 8) are missing from the record at 100 ka and 200 ka, respectively (Fig. 5a). Furthermore, even larger VEI events (VEI 6 and 7) show a significant amount of under-recording, although the increase of under-recording is reduced compared to the smaller VEI events. For instance, 46–49 % of VEI 6 events (MA = 4, 6, 8) and 36–39 % of VEI 7 events (MA = 4, 6, 8) are missing from the record at 300 ka and 500 ka, respectively (Fig. 5a). Similarly, eruption magnitude datasets including smaller eruption magnitude events, show relatively quick increase of under-recording. For example, 83 % of M ≥ 4 eruptions (MA = 8), 58 % of M ≥ 5 eruptions (MA = 6, 8), 46–54 % of M ≥ 6 eruptions (MA = 4, 6, 8), and 33–36 % of M ≥ 7 eruptions (MA = 4, 6, 8) are under-recorded in 100 ka, 200 ka, 300 ka, and 500 ka periods, respectively (Fig. 5b). The total number of eruptions is estimated from the fitted models. During the Quaternary (2.588 Ma; Gibbard et al., 2010), it is estimated that about 56,000–63,000 VEI 4 events (MA = 4, 6, 8), 8,200–9,000 VEI 5 events (MA = 4, 6, 8), 1,100–1,300 VEI 6 events (MA = 4, 6, 8), 150–180 VEI 7 events (MA = 4, 6, 8), 68,000 M ≥ 4 eruptions (MA = 8), 8,700–9,000 M ≥ 5 eruptions (MA = 6, 8), 1,300–1,900 M ≥ 6 eruptions (MA = 4, 6, 8) and 160–190 M ≥ 7 eruptions (MA = 4, 6, 8) occurred in Japan.

Fig. 5
figure 5

Percentage of estimated under-recording with time. a Analysis with VEI datasets. b Analysis with eruption magnitude datasets. The recording rate of the datasets of M ≥ 4.0 (MA = 4, 6), M ≥ 4.5 (MA = 4) and M ≥ 5.0 (MA = 4) were not calculated because some of their consecutive events had the same ages and the denominator of equation 1 becomes zero

Discussion

Under-recording of the global data

Under-recording of the global database can be estimated from the ratio between the Japanese and global eruption frequencies and the actual percentage of the Japanese eruptions in the global database. An almost perfect empirical linear relationship is reported between logarithmic frequency and VEI values by De la Cruz-Reina (1991):

$$ \log (F)=a- bI $$
(19)

where F is the eruption frequency (events/ka) and I is the VEI or eruption magnitude. When eruption magnitudes are taken instead of VEI values, the slope of the trend line, b, is equivalent to the “b-value” of Gutenberg-Richter law in seismology. Between VEI or eruption magnitude categories of I p and I q (I p  < I q ), the mean ratio of the global eruption frequency to the Japanese eruption frequency, r g , is:

$$ {r}_g = \frac{1}{I_q-{I}_p}{\displaystyle \underset{I_p}{\overset{I_q}{\int }}}\frac{10^{a_g-{b}_gI}}{10^{a_j-{b}_jI}}dI $$
(20)

where the subscripts g and j denote the parameters obtained from the global and the Japanese data, respectively. In our analysis, the Japanese eruption frequency (events/ka) of each VEI and magnitude category is calculated from the fitted recording rate models with t = 0 on the assumption that there is little possibility of current under-recording given present global volcano monitoring systems. Because the mechanism of very large explosive eruptions may be different from that of smaller ones (Deligne et al., 2010) and larger eruption categories (VEI 7, M ≥ 6.5 and M ≥ 7) do not have many records (Tables 1 and 2), the empirical linear relationship (equation 19) is estimated for the range of VEI 4 to VEI 6 and M ≥ 4 to M ≥ 6 in this study (Fig. 6 and Table 4). In comparison with the Japanese dataset, temporal patterns of global volcanism in the past 500 years yields a g  = 5.494 and b g  = 0.789 (Fig. 6a; De la Cruz-Reina, 1991). Although the time range of the analysis is short and under-recording of the global dataset is not considered, this result suggests that the eruption frequency of the global volcanism is 10.6–11.5 times more frequent than the Japanese eruptions between the VEI values of 4 to 7 (MA = 4, 6, 8; e.g. Fig. 6a), or 8.7–9.4 % of global eruptions occurred in Japan. Comparison between this percentage and the actual percentage of Japanese eruptions in the global database (about 38 %) suggests that the under-recording of the global database is 4.0–4.3 times larger than estimated for the Japanese data. Pyle (1995) compiled explosive eruptions that occurred globally during the last 2000 years and analyzed their frequency-size relationship, time-averaged eruptive mass and thermal energy release. The frequencies of those eruptions, which are derived from longer time period for larger eruption categories, and the mean volume of those eruption categories (Pyle 1995) yield a g  = 6.126 and b g  = 0.86 for the linear relationship of logarithmic frequency and magnitude with an assumption of tephra density of 1000 kg/m3 (Fig. 6b). For the eruption magnitude categories of M ≥ 4 to M ≥ 6 (MA = 6, 8; e.g. Fig. 6b), global eruptions are about 21.1–23.2 times more frequent than the Japanese eruptions, and 4.3–4.7 % of eruptions in the world occurred in Japan. This percentage and the actual percentage of the Japanese eruptions in the global database (about 38 %) suggest that under-recording of the global database is 7.9–8.7 times larger than the Japanese dataset. Because the Japanese volcanoes accounts for about 3.2 % of the LaMEVE database, the results estimated with the Pyle (1995) approach are more concordant than the results obtained using the De la Cruz- Reyna (1991) approach.

Fig. 6
figure 6

Frequency of events of each VEI and eruption magnitude category. As an example, the case of moving average calculation with window size of 8 is shown. Error bar shows the range of 95 percentile confidence interval estimated by the bootstrap method (Efron 1979) with 10,000 replicates. a Result for VEI datasets; open squares show results of the Japanese data (this study); black squares show the results of global data (De la Cruz-Reina 1991). b Result for eruption magnitude datasets; open squares show results of the Japanese data (this study). The dashed line shows the linear relationship of logarithmic frequency and eruption magnitude compiled from Pyle (1995)

Table 4 Parameters of linear relationships between logarithmic frequency and VEI or eruption magnitude categories (equation 19). MA: window size, 2n, of the moving average calculation (equation 1). M: eruption magnitude. The linear relationships were obtained for the discrete points in the VEI range and M range. The intervals of points were 1 and 0.5 for VEI and eruption magnitude, respectively. Values in the parentheses show the range of 95 percentile confidence interval estimated by the bootstrap method (Efron 1979) with 10,000 replicates

The reason for many missing events

The results of our analyses suggest a large amount of under-recording of events including not only smaller (e.g. VEI 4 and 5) but also larger (e.g. VEI 6 and 7) magnitude eruptions. The under-recording of events is attributed to the absence of historical records, disappearance of tephra units from the geological record and or overlooking of events - that is, not identifying events that are preserved in the rock record. The main mechanisms of disappearance of tephra units are erosion (e.g. Lavigne 2004; Pierson et al., 2013) and alteration of tephra deposits (e.g. Pollard et al., 2003), burial of tephra deposits by younger deposits (Imura and Kobayashi, 2001) and disappearance of the source volcano itself due to burial (e.g. Kamata, 1989) or erosion (e.g. Machida et al., 1997).

The absence of a historical record is a significant reason for a recording rate decrease in the past 1,000 years. For example, Furlan (2010) showed that recording of large eruptions changed considerably at AD 1900 and 1600. These changes can be explained in terms of factors such as colonization, improvements in newspaper reporting, the telegraph and more recently the internet, and development of modern scientific approaches to reporting natural events. Because of this more rapid decrease in historical data compared with geological data, Brown et al. (2014) excluded eruptions less than 1,000 years BP from their analysis to understand geological data. In our analysis, the dataset includes both historical and geological data and shows the rapid decrease process (Fig. 2c, d and h-k). On the other hand, the time scale of this process is much longer (half-life >5,000 years; Table 3) than historical time scale. Therefore, the rapid decrease observed in our analysis must be mainly caused by geological processes. For instance, erosion of tephra deposits must be a significant mechanism responsible for the under-recording of smaller eruption volume events. Especially, recording rates of smaller events show the process of the rapid decrease (Fig. 2c, d and h-k) and suggests that erosion is more dominant during this process. Conversely, older smaller events show slower decreasing process of recording rate (long tail of recording rate, Fig. 2c, d and h-k) and suggest that some of the tephra deposits are preserved from the rapid erosion process as indicated by Brown et al. (2014). Alteration of tephra deposits, burial of tephra deposits by younger deposits and disappearance of eruption source due to burial or erosion must be the mechanism of this slower decreasing process. They must also affect the long term decrease of recording rate of those large eruptions (Fig. 2a, b and e). Furthermore, overlooking events may happen if geologists are biased toward reporting younger (e.g. active volcano < 10,000 years old) and larger events. Reporting bias must affect the identification of older smaller events more than the reporting of larger and younger events.

Volcanic hazard implications

In using an eruption database, under-recording of events must be taken into account to avoid underestimation of potential hazards and spurious inferences concerning how eruption rates have varied with time. Time averaged frequency of eruptions must be calculated by averaging the number of events of each VEI or eruption magnitude category over different time scales (e.g. Simkin, 1993; Pyle, 1995). The time averaged frequency of smaller eruptions will be underestimated when considering long time periods due to their relatively quick disappearance and the time averaged frequency of larger eruptions will be less reliable when considering short time periods due to the small number of eruptions. Our method estimates the half-life of eruption records, which will help to select the calculation time scale of time averaged frequency of each VEI or eruption magnitude category. In addition to the estimation of the regional under-recording of events, under-recording of events at individual volcanoes can be evaluated by other statistical approaches (e.g. Wang and Bebbington, 2012).

Conclusions

We compiled data on large magnitude explosive eruptions in Japan from the LaMEVE database (Crosweller et al., 2012) and investigated the under-recording of events. Brown et al. (2014) analyzed the spatial and temporal bias of the LaMEVE database and showed that under-recording is a function of magnitude and the proportion of historic and geological data in each magnitude subset. They showed that about 40 % of all recorded eruptions have occurred in Japan. Here, we studied the under-recording of the Japanese dataset and also estimated the under-recording of the global dataset on the basis of the estimated eruption frequency and the number of volcanoes. Although the recording rate of smaller events (VEI 4, 5 and eruption magnitude 4–5.5) are basically high, it drops very rapidly to a small value and then decreases slowly back in time. The model fitting, AIC and AICc model selection suggest that the two trends of smaller eruption categories (VEI 4 and VEI 5) can be represented with the double exponential decay model. The recording rates of larger eruptions (VEI 6, 7 and M ≥ 7) show a more slowly decreasing trend, which does not have a significant initial quick drop and which can be represented by a single exponential decay model. The total number of eruptions and the percentage of missing eruptions are estimated from the fitted models. Our results suggest that even larger VEI events have significant under-recording when considering time periods of hundreds of thousands of years. For example, 89 % of VEI 4 events, 65–66 % of VEI 5 events, 46–49 % of VEI 6 events and 36–39 % of VEI 7 events are missing from the record at 100 ka, 200 ka, 300 ka, and 500 ka, respectively. Comparison of frequencies of Japanese and global eruptions suggests that under-recording of the global database is 7.9–8.7 times larger than the Japanese dataset. Therefore, under-recording of events must be taken into account in long-term volcanic hazard assessments.