1 Introduction

Earthquakes can be induced by a wide variety of both natural and anthropogenic processes. Foulger et al. (2018) developed a classification scheme that recognizes 16 categories of human-induced earthquakes resulting from activities as disparate as groundwater removal for agriculture to nuclear weapons tests. They conducted a global review of such earthquakes and developed a comprehensive database of cases, HiQuake, which is publicly available for download at https://inducedearthquakes.org/ (Foulger, et al. 2018; Wilson et al. 2017).

The strength of the evidence that the earthquake sequences in HiQuake were human-induced is variable from case to case. In some cases, the connection between earthquakes and the industrial activity is essentially undeniable. In other cases, it is unpersuasive. Thus, when the HiQuake database was developed, the question arose as regards where to draw the line—which cases to include.

It is to be expected that opinions of stakeholders, including specialist seismologists, industry operators, government, and local citizens, will vary for any particular case. At the same time, there is no rigorous- or industry standard method for assessing the strength of evidence for proposed induced earthquakes (PIEs). No method can determine with certitude whether an earthquake sequence was human-induced or not.

As a consequence, the only rigorous way of deciding which cases to include in HiQuake was to include all cases that had been proposed on a scientific basis, irrespective of differing stakeholder opinions and the personal views of the HiQuake authors. This policy, however, brought with it the problem that HiQuake includes undifferentiated cases featuring the full spectrum of credibility from strong to very weak.

In the present paper, we investigate ways of grading individual cases to provide information on the strength of support for human induction. Currently, the only method applicable to all cases including both historic and modern is the questionnaire approach. This method assesses expert opinion. Several questionnaires have already been published (Davis and Frohlich 1993; Davis et al. 1995; Frohlich et al. 2016; Verdon et al. 2019) and it is clear that the approach suffers from subjectivity. Different analysts may produce different results and, where the questionnaire is complex, it may take a considerable time to apply.

Clearly such a non-objective approach is not ideal and work is in progress to develop more objective, physics- and statistics-based methods for discriminating induced- from natural earthquakes. Schemes based on geomechanical models (Dahm et al. 2013), correlations of observed seismicity with industrial activities, and accurate earthquake focal mechanisms are being explored. However, these approaches require high-quality data, which are not available for cases prior to about 1980, nor publicly available for most cases after this. In addition, such approaches require lengthy modeling work and are not practical for application to the large number of cases in the HiQuake database.

Foulger et al. (2022) investigated the performance of different questionnaires. The objectives of that project were to (a) measure the degree to which question subjectiveness affected the results, (b) identify ways of managing this, and (c) develop an effective tool that could be rapidly applied to the very large number of diverse cases in HiQuake (currently 1239 across 16 categories—see https://inducedearthquakes.org/ for the most up-to-date version). Three different schemes were developed and trialed.

The first scheme was a generalization of the most sophisticated existing scale for fluid-injection-related cases (the Generic Verdon (GV) scheme; see Verdon, et al. 2019 for the original scheme). This was taken as a standard to which other schemes could be compared and the minimum number of changes possible was made to generalize it. The second scheme was a fully subjective, simple, five-point Likert scale (Likert 1932) (the Strength of Case (SoC) scheme). (A Likert scale comprises a statement, e.g., “were the earthquakes induced,” and a list of possible responses. The analyst selects the response that most accurately reflects their opinion. Such scales are commonly used to assess consumer satisfaction with service, goods etc.) The third scheme was a fully objective approach based on collating the data claimed to support a natural or induced origin (the Number of types of Evidence (NoE) scheme).

Foulger et al. (2022) trialed these three schemes on a subset of 55 large-MMAX cases from the HiQuake database. (MMAX is the magnitude of the largest earthquake in the sequence.) These included earthquake sequences proposed to have been induced by a wide range of industrial activities including coal bed methane extraction, conventional oil and gas production, shale-gas fracking, geothermal fluid production, groundwater extraction, mining (extraction of rocks and minerals), water, gas and waste fluid injection, waste fluid disposal (including CO2), water reservoir impoundment, and construction. For any particular scheme, the repeatability of results between independent, specialist analysts, expressed as the Pearson correlation coefficient R, was in the range 0.5–0.8 for most indicators of both dataset quality and strength of support for a human-induced origin. An encouraging result was that the simple, five-point Likert scale, which took only a few minutes to apply, gave results that correlated at the level of approximately R = 0.8 with those from the “standard” GV scale, if the output of two analysts were averaged.

For the present paper, we built on this initial exploratory work. Our goal was to develop the best possible questionnaire scheme to apply to the 1239 cases in HiQuake. Such a scheme should ideally (a) be applicable to any earthquakes, natural or human-induced; (b) be generalized, simple, and uniform in style across the questions; (c) give reasonable results when compared with uncontroversial cases; (d) allow for all relevant information to be taken into account; (e) produce results that are simple and easily understood by non-seismologists; and (f) express the results in a way that does not imply unrealistic numerical precision. We aimed for a fully generalized simplified scale, built on the foundation of the GV scheme, designed to assess the strength of the evidence published, not the personal opinion of the analyst, and with the simplest possible point-allocation distribution between the questions.

Our final scheme for Evaluating Proposals of Induced Earthquakes (E-PIE) features nine questions each with four possible, generic, simplified answer choices. These answers correspond to evidence (a) lacking, (b) supporting a natural origin, (c) equivocal, and (d) supporting a human-induced origin. A simplified point-allocation system that up-weights the most important factors, and down-weights data types with weak diagnostic power, produced the best results when applied to uncontroversial cases by multiple, independent analysts. The inclusion of two general questions enabled data not considered by the other questions to be taken into account. The results are visualized as a colored pie chart that avoids implications of unrealistic precision.

E-PIE was tested by five independent analysts on a set of 23 diverse cases from HiQuake. It performed well, yielding both good repeatability between analysts for strong cases and good agreement with the fluid-injection-specialized scheme of Verdon et al. (2019).

2 E-PIE: a new questionnaire for Evaluating Proposals of Induced Earthquakes

The scheme of Verdon et al. (2019) was the latest of several efforts over the last 30 years to perfect a questionnaire-based system for measuring the strength of evidence for induction of earthquakes by fluid injections. Foulger et al. (2022) adapted this scheme to the minimum extent possible to generalize it to other industrial processes. This work demonstrated that a scheme optimal for the full breadth of potential cases, and minimal in complexity, required even more fundamental generalization. An ideal, general scheme cannot include detailed specifications because of the enormous diversity of earthquake-induction processes. It also needs to be general enough to be applied to new kinds of data and processes not yet proposed but which might emerge in the future. The onset of induced earthquakes often comes as a surprise, underlining the fact that the subject is still at an early stage of understanding. Ideally the scheme needs also to be applicable to cases not proposed to be induced to test whether further investigation might be warranted. Such a scheme might be useful to identify new human-induction mechanisms.

Issues that emerged from the work of Foulger et al. (2022) that needed to be addressed in the new scheme included:

  1. 1.

    Several types of seismic data and results have been claimed to support human induction, but not taken into account in earlier questionnaires.

  2. 2.

    Many older, poorly monitored, or poorly documented PIE cases lack detailed data. This was dealt with in the scheme of Verdon et al. (2019) by applying the questionnaire twice—once for the completeness of the dataset and once for the strength of evidence for a natural or induced origin. The results are then expressed as two numbers. In order to incorporate data completeness into a scheme that expresses the results as a single visual, the situation where the data are missing or insufficient must comprise a possible questionnaire answer.

  3. 3.

    Earlier questionnaires tend to interrogate proximity of earthquakes, in time and space, to the industrial activity. However, the most pertinent issue is proximity to the region within which stress or other parameters associated with seismogenesis were industrially modulated. In the case of large or distributed industries, e.g., groundwater removal, this region may be considerably more extensive than the industry itself, and the time over which environmental modulation persists can be long.

Our new, E-PIE questionnaire scheme is based on the structure and philosophy of the scheme of Verdon et al. (2019) but extensively generalized. The question order has been revised for reasons of logic, e.g., so the scheme-exit criterion comes first. It was developed through a lengthy, iterative process that included testing several versions on a suite of 23 test cases (Table 1). Testing of each trial version was followed by panel discussion, adjustment of question- and answer wordings, and re-application to the test cases. The challenge to design a satisfactory one-size-fits-all questionnaire was formidable. Four analysts, all with research experience in earthquake seismology, contributed to this process.

Table 1 The 23 cases used to develop E-PIE and to which the final version was applied

The test cases used were chosen to provide a broad range of proposed induction mechanisms and plausibilities of human induction. Of these 23 cases, 17 are unique industrial activities. Three cases, Preese Hall, Brockham, and Horse Hill, all in the UK, were studied for both early and late times, when different amounts of data were available. They were analyzed in detail by Verdon et al. (2019) to illustrate and test their scheme. We are thus able to compare the results from E-PIE with those of Verdon et al. (2019).

2.1 The E-PIE scheme

Our final scheme for Evaluating Proposals of Induced Earthquakes (E-PIE) is shown in Table 2. We retained the innovations of the scheme of Verdon et al. (2019) of weighting different data types differently and accounting for data that are missing or inadequate. E-PIE is considerably reduced in complexity compared with that scheme, however. Like the scheme of Verdon et al. (2019), E-PIE also anticipates that results are open to reappraisal should new relevant data become available.

Table 2 E-PIE—a scheme for Evaluating Proposals of Induced Earthquakes. Numbers in rightmost column indicate the number of points allocated to particular questions

E-PIE contains nine questions. For each of these questions, there are four possible answers corresponding to:

  1. a.

    no- or insufficient information to answer the question (color code white)

  2. b.

    data support a natural induction process (color code green)

  3. c.

    data equivocal (color code yellow)

  4. d.

    data support a human-induction process (color code red)

Each question carries points which are assigned to the chosen answer. As recognized by Verdon et al. (2019), some types of data clearly have stronger diagnostic power than others. For example, if major industrial activity starts and earthquakes of a previously unusual nature onset quickly in close proximity, and correlate with variations in operations, little else may be needed for a convincing case that the earthquakes were induced.

We trialed several different point distributions. The final distribution allocates a total of 360 points. Of these, questions 2, 3, and 4 each score 100 points (i.e., 100/360 = 27.8% of the total). Each of the other nine questions receive 10 points (i.e., 10/360 = 2.8% of the total). Questions 2, 3, and 4 combined thus attract a total of 83% of the points. The logic behind this is as described above. If there is close correlation in time and space with industrial activity, in departure of previous behavior, little more is needed for the human induction case to be persuasive.

2.2 Visualization of the results

After completion of the E-PIE questionnaire, points are totaled for each of the four possible answers a.–d. for all questions. A visual expression of the overall result is presented as a colored pie chart without the numerical data. Examples are given in “Sect. 3.” The reason for this is that, for any qualitative questionnaire scheme, considerable variation in answer choices will occur between independent analysts. Foulger et al. (2022) quantified this variation in tests using several questionnaires and three independent analysts. The same issue is confirmed in the present work where five analysts also produced variable results using E-PIE (see “Sect. 4”). The pie-chart visual summary communicates the fundamental inexactitude of the results.

A predominant red sector indicates major support for a human-induced origin. A preponderance of green indicates strong support for a natural origin. The size of the white sector expresses the proportion of desirable data that are unavailable. These data may become available in future at which time the analysis can be revised.

The yellow pie-chart sector indicates the proportion of data that are equivocal. Yellow does not imply partial support for either a natural or a human-induced origin and it should thus not be viewed as indicating, in some sense, support divided between the two. It is better considered to represent “no relevant data,” i.e., data do exist but are no more diagnostic than if they were absent.

It is interesting to reflect on the logic behind the information illustrated in the pie charts (Table 3):

  • Four out of the possible nine instances of answer b. correspond to a natural origin being required by the data (Q1, Q2, Q3, and Q7). The other five correspond to either a natural- or a human-induced origin being permitted.

  • All instances of answer c. correspond to either a natural- or a human-induced origin being permitted by the data.

  • Two out of the possible nine instances of answer d. correspond to a human-induced origin being required (Q4 and Q7). The other seven correspond to either a natural- or a human-induced origin being permitted.

Table 3 Logical implications of answer choices for the nine questions posed in E-PIE. Natural-R: natural origin required, Induced-R: human-induced origin required, Either-P: either natural or human-induced origin permitted

There is thus a logical asymmetry that implies seismic data potentially have more power to rule out a human-induced origin than to demonstrate that one is required. Of the 23 test cases we trialed, in no case did E-PIE return a logical conflict where different answers required natural- and human-induced origins.

Notwithstanding the need to avoid giving results to an unsupportable degree of accuracy, for purposes of utility it is desirable to give numerical scores to cases in the HiQuake database. A suitable way of doing this might be to allocate scores of 1–5, with 1 representing very weak/highly unlikely cases and 5 representing very strong/highly likely cases. It is to be expected that such gradings might have to be revised as new data become available, and to this end metadata, including the date of the analysis, should be included. Development of such a scheme is currently the subject of ongoing work (“Sect. 6”).

2.3 E-PIE structure, questions, and answers

In this section, we provide guidance for applying E-PIE. In keeping with the objective of providing a scheme that is fully general and applicable to the entire spectrum of cases, this guidance is general.

The E-PIE questionnaire comprises five blocks:

Block 1—the orientation block. E-PIE begins with orienting text concerning the plausibility of the proposed induction process and the spatial and temporal range of likely environmental modulation. The frequency of similar cases is likely to influence many answers. E-PIE does not contain point-scoring questions specific to these factors, but some questions require the analyst to have an opinion on these points.

The issue of the likely size of the area and the depth extent modulated physically by the industry is a key issue that is relevant to many of the questions. It is directly relevant, for example, to questions 2, 3, 5, and 6. This is also a critical and much-debated issue that is key to opinions regarding the veracity of many cases and consequential operating decisions. Factors proposed to encourage earthquakes far from the causative industrial work include direct pore pressure changes, the influence of fracture systems, poroelastic coupling, aseismic slip, and reactivation of pre-existing regional fault systems.

The latter issue is particularly important where unexpectedly large earthquakes are thought to have been induced. Geologically minor events, e.g., the injection of a relatively small volume of fluid, may apparently trigger earthquakes that are large in comparison and/or at unexpectedly large distances. An example is where fluid injections are proposed to have induced earthquakes much deeper than the formation into which the fluid was injected. A single such case might be considered unpersuasive, but multiple cases, including ones that are essentially undeniable, rightly should influence the results. Many such cases have been proposed for injection-related earthquakes in Oklahoma (e.g., McGarr and Barbour 2017).

As the science of human-induced earthquakes (and, indeed, earthquake seismology in general) has developed, theory has often followed observations rather than led to them. That industry may modulate the crust in ways capable of enabling earthquakes out to unexpectedly large distances has been demonstrated by persuasive examples, e.g., Paradox Valley, Colorado (Ake et al. 2005). As cases of this kind accumulate, scientists’ views regarding the plausibility of proposed induction mechanisms evolve. The best approach to making realistic assessments of the extent of physical modulation is thus a broad familiarity with the variety of PIE cases and avoidance of over-reliance on existing theory.

It is nevertheless clear that a particular industrial practice may induce observable earthquakes in some cases but not in others. For example, in the case of impoundment of water reservoirs > 200 m deep, approximately 20% induce earthquakes but the other 80% do not. Thus, although other similar cases may increase the plausibility of the process, each individual case must still be evaluated in isolation and on its own merits.

Block 2—question 1—exit criterion. Question 1 assesses whether an induced origin can be ruled out and continuation with the questionnaire abandoned. This is the case where the proposed causative industry did not exist or had not started at the time of the PIEs. Such a situation could arise where unusual earthquakes are proposed to have been induced by undisclosed work which is later found to have either not occurred or to have onset after the earthquakes. If the exit criterion is not selected, this block accounts for 2.8% of the total points.

***Block 3—questions 2–4—the PIEs. This block investigates whether there is spatial and/or temporal correlation between the PIEs and the industrial activity. An example of such correlation is repeated bursts of earthquake activity at the time and location of injection events in boreholes. This block is the most influential in determining the final result as each of the three questions carries 100 points. Thus, the entire block accounts for 83% of the total.

Block 4—questions 5–6—pre-industrial earthquakes. This block investigates whether the area is prone to natural seismicity, a factor that would weaken the case for the PIEs having an induced origin. An example is a pre-industrial history of natural earthquakes in a geothermal area or evidence for Quaternary fault movements in craton interiors. This block accounts for 5.6% of the total points.

Block 5—questions 7–9—other data. This block captures relevant information not considered in the earlier questions. Examples include earthquake focal mechanisms, b-values, radical changes in the style of seismic stress release, and observations of co-seismic mine-gallery contraction. This block accounts for 8.3% of the total points.

2.3.1 Q1: PIEs–temporal: did the PIE sequence onset before, during, or after the industrial activity?—10 points

Question 1 assesses whether industrial activity is/was present without regard to distance. This question accommodates PIEs in the absence of knowledge regarding what industry might have induced them. Such a scenario might arise, for example, if a clandestine nuclear test is suspected, or if fluid injection information is lacking at the time of the PIEs because of commercial confidentiality. In those cases, Q1a would be selected. If subsequent information indicated that no industry had been underway at the time, an answer of Q1b would trigger exiting the scheme without the need for further consideration.

2.3.2 Q2: PIEs–epicenters: is there spatial collocation between the PIEs and the likely area of environmental modulation by the industrial activity?—100 points

Question 2 is informed by the analyst’s opinion regarding the size of the area that was plausibly modulated by the industrial activity. This area is specified, rather than simple proximity to the industrial work, because some processes, e.g., groundwater flow, can affect seismicity at considerable distances from operations over long time periods.

The analyst will take into consideration the individual case proposal, the abundance of cases of the same type, and the physical processes proposed. An example is fluid injections. Persuasive cases involve earthquakes extremely close to the injection point (e.g., Lei et al. 2019) or at distances of several kilometers (e.g., Ake, et al. 2005).

2.3.3 Q3: PIEs–hypocenters: is there spatial collocation between the PIEs and the likely volume of environmental modulation by the industrial activity?—100 points

Question 3 is similar to Q2 but considers hypocentral depths. It is worded to be logically coupled to Q2 by considering the likely volume of environmental modulation. Considering volume, and not simply hypocentral depth, avoids the problem where earthquakes at appropriate depths but not in an appropriate area could trigger a pro-induced answer.

2.3.4 Q4: PIEs–temporal: is there temporal correlation between the PIEs and specific industrial events?—100 points

Question 4 considers the temporal relationship of the earthquakes and detailed industrial operational events. It is distinct from Q1, which is concerned only with the existence of industry at the relevant time and not variations in industrial activity.

Temporal relationships between earthquakes and industry are diverse. They include vigorous swarms that onset almost instantaneously with fluid injections in boreholes, earthquakes that gradually increase in number and magnitude over a period of years or decades, e.g., as geothermal or hydrocarbon reservoirs are depleted, and single large earthquake sequences proposed to have been induced by long-term groundwater depletion.

Such relationships are easier to assess where many earthquakes occur over a long period and clear and repeated correlations with industrial events are observed, e.g., at the Koyna Dam, India (e.g., Gupta 2002). They are more problematic to assess where industrial activity was long-term but only one or very few earthquake sequences are implicated. For example, the 2015 Gorkha, Nepal, M7.8 earthquake sequence was a single seismic episode clearly comprising a large release of tectonic stress that may have been hastened by additional stress loaded by long-term dewatering of the Indo-Gangetic plains (Kundu et al. 2015).

2.3.5 Q5: pre-industrial earthquakes–epicenters: is there evidence for pre-industrial earthquakes at or near the site of the PIEs?—10 points

Question 5 is designed to assess whether the PIEs occurred at a site, and/or within a wider region, known to be prone to natural earthquakes. Analyst judgment is required regarding how large the “wider region” reasonably is. This may be influenced by the envisaged spatial scale of industrial environmental modulation. The question wording is sufficiently general that non-seismic evidence may also be considered, e.g., from geomorphology, quaternary geology, and oral history.

It is interesting to note that regulators generally consider areas prone to natural seismicity to be at higher risk of inducing earthquakes by industrial activity. However, the E-PIE questionnaire counts earthquakes prior to industrial activity as a factor that reduces the strength of cases for induction. This highlights a subtle point, which is that E-PIE does not seek to prove or disprove whether earthquakes were induced, but simply to assess the strength of scientific cases. Simply put, it may be easier to induce earthquakes in an already seismogenic area, but it is more difficult to build a scientific case for them. This is a curious awkwardness between regulatory practice and the ability of scientific observations to discriminate.

2.3.6 Q6: pre-industrial earthquakes–hypocenters: is there evidence for pre-industrial earthquakes in the same volume as the PIEs?—10 points

Question 6 is similar to Q5 but considers hypocentral depths.

Questions 5 and 6 are to some extent correlated. For example, if Q5a is selected then Q6a almost invariably must also be selected. If Q5d is selected then Q6d is also required to be selected. Choices of Q5b or Q5c could be paired with various answers for Q6.

2.3.7 Q7: focal mechanisms: are the focal mechanisms consistent with a natural and/or induced earthquake cause?—10 points

Although focal mechanisms are considered an important element in any earthquake study, as regards the HiQuake database in general, they have little power to distinguish human-induced earthquakes from natural ones. A number of studies of injection-induced seismicity report no systematic difference in focal mechanism between presumed-natural and presumed-induced earthquakes. In some cases, notably mining-related earthquakes, volumetric components have been reported that suggest the collapse of mine galleries (Dreger et al. 2008; McGarr 1992), even supported by near-field observations. However, such cases are rare. For most cases in HiQuake, no focal mechanisms are available.

Most proposed human-induction processes, in particular for the large-MMAX cases we use as examples, “trigger” earthquakes. That is to say, a large fraction of the stress released by the resulting earthquakes is pre-existing natural tectonic stress on existing faults (McGarr et al. 2002). The focal mechanisms of such earthquakes are similar to natural earthquakes. Even where little pre-existing natural stress is released, PIEs may occur preferentially on pre-existing faults. As a result, only in exceptional cases are the focal mechanisms of PIEs not consistent with the regional stress regime. Of the 23 example cases we present in this paper (“Sect. 4”), ~ 80% returned equivocal results (yellow). Only 9% (i.e., two cases) required a human-induced origin and none required a natural origin.

Candidate answers for question 7 cover the situations where no information is available, and the four possible combinations of the expected natural- and induced focal mechanisms.

2.3.8 Q8: other–seismic data: are there other seismic data to support a natural or induced cause, e.g., swarm, foreshock-aftershock pattern, b-value, total number of earthquakes, radical change in style of seismicity, stress release corresponding to the earthquake magnitude or seismicity?—10 points.

Questions 8 and 9 incorporate information not covered in the previous questions. Q8 refers to seismic data and Q9 to non-seismic data. Care is needed in applying Q8 to avoid “double scoring” an observation. For example, a radical change in seismicity may have been taken into account by Q5, if no previous seismicity is known for a region.

2.3.9 Q9: other–non-seismic data: are there non-seismic data that support a natural or induced cause, e.g., direct nucleation effects, precursory surface deformation?—10 points

Question 9 refers only to non-seismic data that are relevant to discriminating the earthquakes. An interesting example to illustrate the application of this question is ground deformation. Deformation in the epicentral area is expected for any moderate or major seismic activity, regardless of induction mechanism and thus co-seismic deformation may result in an answer of Q9c (equivocal). On the other hand, if a pre-seismic, industry-related ground deformation pattern was reinforced by deformation related to PIEs, this may justify an answer of Q9d (human-induced).

Calculations showing that industrially induced stress increases were of a magnitude and pattern that would have encouraged the sense of seismic slip observed may also result in an answer of Q9d.

3 Examples

In this section, we describe in detail the application of E-PIE, by a single analyst, to three example cases. For illustrative purposes, we choose cases that returned a variety of results. These cases are earthquakes at the Groningen gas field in the Netherlands from 1984 to the present; the November 2017 M5.4 Pohang, South Korea, EGS-related PIE sequence; and the PIEs that followed the 2001 deep-penetrating bombing of Tora Bora, Afghanistan.

3.1 Groningen gas field, the Netherlands

The Groningen gas field in the Netherlands began production in 1963 and has been Europe’s largest gas producer for several decades. Earthquakes, now generally considered to have been induced by production-related reservoir compaction, began in 1986. By 2012, they had increased in magnitude to M3.6 and were causing public concern (van der Voort and Vanclay 2015).

The Groningen gas field is economically important to Europe. As a result, extensive research and development was conducted to understand the induction mechanism and estimate the size of the largest earthquake that might occur. A rich resource of information is thus available about the earthquakes. The field is monitored by a local seismic network that currently includes ~ 60 seismic stations, and over 1000 earthquakes have been recorded (e.g., Smith et al. 2019).

We applied E-PIE to the earthquakes as a whole in the gas field. The detailed results and influencing factors are given in Table 4 and shown graphically as a pie chart in Fig. 1. The results overwhelmingly favor a human-induced origin.

Table 4 Application of E-PIE to earthquakes in the Groningen gas field, the Netherlands
Fig. 1
figure 1

Pie chart illustrating the E-PIE results for the Groningen gas field proposed-extraction-related earthquakes

The only questions that did not return an answer of d. (human-induced) are Q7 and Q8. Q7 concerns earthquake focal mechanisms. As described in “Sect. 2.3.7,” this question differs from others in that many induction processes are expected to trigger the release of natural tectonic stress on pre-existing faults. The Groningen earthquakes are such a case. Q8 considers seismic data not considered by previous questions. There is little that cannot reasonably be argued to be consistent with both a natural- and human-induced origin and thus Q8c is selected.

3.2 The Pohang Enhanced Geothermal Systems (EGS) project, South Korea

The November 2017 M5.4 Pohang earthquake sequence was the most damaging ever recorded instrumentally in South Korea. It injured 90 people and caused US$52 million of damage. It occurred very close in time and space to fluid injections performed for an enhanced geothermal system (EGS) project there, and as a result has been proposed to have been induced by the operations (Grigoli et al. 2018; Kim et al. 2018). At the time of writing the mainshock is the largest earthquake ever proposed to have been induced by EGS stimulation.

At the same time, some data and arguments weigh in favor of a natural origin (McGarr 2018), and some information is not publicly available. This PIE sequence is thus useful to illustrate the performance of E-PIE where the evidence is variable. The results for this case are shown in Table 5 and the corresponding visual pie-chart summary is shown in Fig. 2. Our assessment corresponds only to the PIE sequence that onset in November 2017 at the Pohang EGS site. It does not encompass earlier injection-related earthquakes that display different characteristics.

Table 5 Application of E-PIE to earthquakes associated with the November 2017 M5.4 Pohang EGS-related PIE sequence, South Korea
Fig. 2
figure 2

Pie chart illustrating the E-PIE results for the November 2017 M5.4 Pohang EGS-related earthquake sequence, South Korea

3.3 Deep penetrating bombing

We are aware of one publication, that ofBalassanian (2005),Footnote 1 that proposes deep penetrating bombing (also known as massive ordnance penetrators (MOPs), or “bunker-busters”) has induced earthquakes. Balassanian (2005) considers four potential cases, those of deep penetrating bombing at Kosovo, Yugoslavia (1999); Baghdad, Iraq (1991); Tora Bora, Afghanistan (2001); and Kirkuk, Iraq (2003).

The paper concludes that the case of the 2001 bombing of Tora Bora, Afghanistan, is the strongest. Thus, we consider that particular case only. The argument presented rests solely on an increase in frequency of earthquakes in a particular magnitude range following the bombing. Balassanian (2005) presents evidence that during the year prior to the bombing, two earthquakes with M > 5.5 occurred within a range of 1000 km compared with the year following bombing when eight earthquakes with M > 5.5 occurred within 500 km and 10 within 1000 km.

The case is built on assertions that such a time distribution is unlikely to occur by chance and has not been observed anywhere else in the world. We applied E-PIE to this case and the results are given in Table 6 and Fig. 3.

Table 6 Application of E-PIE to earthquakes following the 2001 deep penetrating bombing of Tora Bora, Afghanistan
Fig. 3
figure 3

Pie chart illustrating the E-PIE results for the case of the 2001 deep penetrating bombing of Tora Bora, Afghanistan

4 Application of E-PIE to 23 test cases by multiple analysts

We explored the repeatability of E-PIE results between analysts, and the performance of the scheme with a larger test set of cases. Five analysts independently applied the final scheme to the 23 test cases used for development (Table 1). Two of the analysts (analysts 4 and 5) were little- or not involved in the development work. The full results are given in Appendix.

To assess the overall results, and the repeatability between analysts, we quantified the results as follows. We calculated the degree of support for a natural or human-induced origin by allocating numerical scores to answer-choices a.–d.:

  1. a.

    0

  2. b.

     − 1

  3. c.

    0

  4. d.

     + 1

For each question, these scores were multiplied by the number of points allocated to that question (either 10 or 100) and the total normalized to a scale of − 1 (100% consensus natural) to + 1 (100% consensus human-induced). The proportion of ideal data available (i.e., the completeness of the dataset) was calculated by summing the total number of points allocated and dividing by 360.

Figure 4 shows histograms illustrating the contributions to the final scores of each of the nine questions to the results. Questions 1–4 have the most power to identify human-induced earthquakes. Questions 8–9 have some influence, and questions 5–7 are of the least significance. If the question weightings are taken into account (lower panel, Fig. 4), questions 2–4 dominate the final result.

Fig. 4
figure 4

For each of the five analysts, average score for each of the nine questions. Upper panel: without weighting by number of points allocated; lower panel: with weighting by number of points allocated

Figure 5 summarizes the results for the five analysts and 23 cases. The degree of support for a natural- or human-induced origin is displayed on the horizontal axis, which ranges from 100% consensus natural (left, green) to 100% consensus induced (right, red). The result for each analyst is shown as a label, with the degree of completeness judged for the dataset as the number in the label. The median result is shown as a label in large font and the mean result as a yellow label (see Fig. 5 caption for more details). The cases are ranked by mean result in order of strength of support for a human-induced origin. This ranges from very strong to entirely absent (i.e., a natural origin). Pie charts for both the mean and median results for each case are shown in Fig. 6. The results of individual analysts are shown as pie charts in Appendix.

Fig. 5
figure 5

Summary of E-PIE results from five analysts for the 23 cases studied. The cases are ranked based on their mean result, top to bottom, in order of the degree of support for a human-induced origin. Horizontal black lines separate the 23 cases, labeled at left on the vertical axis with their rank (large font) and case name/number (small font—Table 1). For each case, the five horizontal white dashed lines correspond to analysts 1–5. The horizontal axis corresponds to results ranging from 100% support for a natural origin (green—far left) to 100% support for a human-induced origin (red—far right). Each white dashed line is labeled, with the position of the label indicating the strength for a natural/induced origin and the number on the label indicating the proportion of ideally desired data available. The median result is shown in a larger font and the mean of all analysts by a yellow label. For variations between analysts for individual questions, see Fig. 4

Fig. 6
figure 6figure 6

Pie charts showing, for each of the two columns, (left) the mean and (right) the median result for the five analysts for the 23 test cases

The cases for which the strongest evidence for human induction was found are the Cannikin, Alaska, nuclear weapons test (case #5, ranked 1st) and the Groningen gas field in the Netherlands (case #9, ranked 2nd). Analyst results for these cases were, respectively, 0.9, 0.9, 1.0, 1.0, and 0.9 (mean = 0.94; RMS = 0.89; median = 0.9; MAD = 0.04), and 1.0, 0.9, 0.9, 0.9, and 0.6 (mean = 0.86; RMS = 0.73; median = 0.9; MAD = 0.08). Both cases agree well between analysts (Fig. 5, Appendix).

The cases for which the evidence for human induction is weakest are the Horse Hill, UK, oil-extraction case (later time/more complete dataset) (case #23, ranked 23rd) and the Selemo and Lesedi pilot pods, Botswana, coal-bed methane case (case #7, ranked 22nd). Analyst results for these cases were, respectively, all − 1.0 (mean =  − 1.00; RMS = 0.00; median =  − 1.00; MAD = 0.00), and − 0.5, 0.0, 0.4, − 0.6, and − 0.5 (mean =  − 0.24; RMS = 0.54; median =  − 0.5; MAD = 0.30).

We investigate the relationship between the degree of support for a human-induced origin and the spread of results between analysts. In Fig. 7, we plot mean result vs. RMS variation between analysts (blue dots) and median result vs. the MAD (mean absolute deviation; red dots).

Fig. 7
figure 7

For the 23 test cases, plot of degree of support for a human-induced origin vs. variation in results between the five analysts. Blue: mean result vs. RMS variation between analysts. Red: median result vs. MAD (mean absolute deviation) between analysts

With the exception of the 23rd-ranked case (Horse Hill later time), which triggered the “exit” criterion for question 1, the spread of results between analysts is systematically greater for cases where the evidence for human induction is weaker. This relationship is strongest for the mean vs. RMS results.

The source of the variation in opinion between analysts for individual cases may be deduced from Fig. 4. Clearly analysts vary in their receptiveness to proposals of human induction of earthquakes. Q2, Q3, and Q4 carry the majority of the points available. Analyst 5 is more likely to answer those questions in favor of human induction than, for example, analyst 2.

4.1 Comparison with the results of Verdon et al. (2019)

We compare the results of E-PIE with those of Verdon et al. (2019) for the cases of Preese Hall, Brockham, and Horse Hill, all in the UK. Verdon et al. (2019) used these cases to illustrate the performance of their scheme at early stages of data-gathering, when the datasets were rudimentary, and at later stages when the datasets were more complete. The results for measures of both data-set completeness and strength of support for human induction are shown in Fig. 8. The numerical data are given in Table 7.

Fig. 8
figure 8

Plot showing correlation between results of the schemes of Verdon et al. (2019) and E-PIE for data completeness (ESR vs. data completeness) and strength of evidence for human induction (ESR*IAR vs. the mean and median E-PIE measures)

Table 7 Comparison of the results determined using E-PIE, and those determined by Verdon et al. (2019), using their original scheme, for the cases of Preese Hall, Brockham, and Horse Hill, UK, at early and late stages of data collection. ESR: evidence strength ratio—a measure of dataset completeness (Verdon, et al. 2019), IAR: induced assessment ratio—a measure of the tendency of the available data to support a human-induced origin (Verdon, et al. 2019). ESR*IAR is the induced assessment ratio weighted by the ESR measure of data completeness and indicates the overall strength of evidence for a human-induced origin

The correlation between E-PIE and analogous Verdon et al. (2019) results is good for all measures. The scheme of Verdon et al. (2019) found that for the case of Preese Hall, the early-time data suggested a weak case for human induction which greatly strengthened when more evidence became available. For Brockham, a weak case for a natural origin strengthened with more data and for Horse Hill, an initial very weak case for human induction became a strong case for a natural origin. The E-PIE scheme produced similar results.

5 Discussion

5.1 E-PIE: need and design

E-PIE was developed to fulfill a hitherto unmet need for a tool suitable to apply to all the current 1239 cases in the HiQuake database. This tool needed to be sufficiently generalized to render it applicable to future new processes and data. Currently, the amount of information available documenting individual cases in HiQuake varies greatly. Some cases are documented by large bodies of publications whereas others may have only a mention in a single table based solely on only unpublished or obscure literature. The task of grading all the cases in HiQuake is thus challenging but nevertheless ongoing because there is a need for the results. The gradings produced are not intended to replace in-depth studies of individual cases, nor to be final. They aim to provide initial, homogeneous gradings for the entire database to facilitate sorting of the cases therein for future analyses that require it.

E-PIE was designed and worded with extreme care. The “Generic Verdon” scheme (Foulger, et al. 2022; Verdon, et al. 2019) was used as a starting template and E-PIE retains several features of best practice from that scheme, including enhanced weighting of the most diagnostically powerful questions. The final questionnaire was informed by group familiarity with an extensive range of diverse cases in HiQuake, multiple discussions, trials by four analysts, analysis of discrepancies, and additional iterations of the text. The final scheme:

  • is consistent in language between questions

  • minimizes duplicating concepts across multiple questions, e.g., the spatial distance between the PIEs and the environmentally perturbed region

  • allows all information to be taken into account

  • produces intuitive results for non-controversial cases

  • has generally good repeatability between different analysts

At the same time, it balances several trade-offs. We aimed to design an appropriate point-allocation scheme between questions while avoiding an overly complex, or unrealistically precise, numerical weighting scheme. Our final choice of points was not based on specific quantitative study, and other choices could have been made. Future application of the scheme may reveal whether our choices are optimal. We aimed to use uniform language, minimizing the proliferation of qualitative terms such as “significantly,” “coincident with,” “some correlation,” “could be,” “distant,” and “sufficiently.” We avoided bias toward common induction processes, e.g., hydrocarbon extraction or fracking. We provided avenues for all relevant information to be included while minimizing scheme complexity to ensure accessibility to as many stakeholders as possible.

Regardless of efforts to perfect question wordings and carefully explain the envisaged application of E-PIE, variations in results between analysts are inevitable because of the qualitative nature of the exercise. It is thus desirable to express the results in a way that imparts this inconvenient fact. We designed a method for displaying the final results visually in a way that signals the non-exactness of the final result—a colored pie-chart without annotation.

5.2 Induced vs. triggered earthquakes

As pointed out by McGarr et al. (2002), industrial activity may modulate stress in the crust in a manner that encourages the release of pre-existing tectonic stress in earthquakes. Thus, the stress released may not entirely be that added by the industrial activity but may be predominately natural. These cases are known as “triggered” earthquakes. In triggered cases, the industrial activity may merely have advanced in time inevitable natural seismicity although this, of course, can never be proven. In the current paper, we do not distinguish between induced and triggered earthquakes and cases of both types are included in HiQuake.

The 2015 M7.8 Gorkha, Nepal, sequence is an example of a proposed triggered earthquake sequence. The majority of the stress released by this large earthquake sequence was undisputedly natural. Application of E-PIE finds moderate evidence that the earthquake sequence was induced. This should be interpreted as evidence that the industrial activity modulated the time of occurrence of the seismic stress release and not that all the stress released was of industrial origin.

5.3 Human-induced or not?

Early questionnaire systems that allocated points out of a total of order 10 divided the results into categories with textual descriptions, e.g., “almost certainly induced/probably induced/possibly induced/tectonic” (see Foulger, et al. 2022 Sect. 2 for a summary, Frohlich, et al. 2016). The scheme of Verdon et al. (2019), which allocates a maximum of 96 points, avoided doing this.

In the current paper, we adopt the same approach as Verdon et al. (2019). Our results show clearly that there is a continuum in plausibility between cases and at present any division into categories would be arbitrary. Divisions may be recommended in future when further analysis and comparisons between schemes provide a logical basis.

Foulger et al. (2022) found that an adapted version of the scheme of Verdon et al. (2019) could identify earthquakes with induced characteristics that had not yet been proposed to be induced. E-PIE is structurally and philosophically similar to the scheme of Verdon et al. (2019) and is thus expected to have this capability also. This is not tested in the present paper but would be a useful direction of future work.

Another question is how well E-PIE may perform in future, when new findings have extended knowledge and understanding of seismic induction, possibly even identifying entirely new mechanisms. E-PIE has the broadest and least-prescriptive wording of any existing questionnaire scheme designed for this work and in theory should be the most flexible going forward. Future experience will reveal how well this stands up to the test of time.

5.4 Repeatability between multiple analysts and comparisons with a specialist scale

Application of E-PIE to a larger dataset—23 diverse cases—by five analysts yielded encouraging results. The results were easily quantifiable to provide numerical results for both completeness of the dataset and strength of support for human induction. Questions 1–4 have the strongest diagnostic power to detect human-induced earthquake sequences, with questions 5–9 contributing subsidiary information.

The degree of repeatability between analysts correlated positively with the mean/median strength of support for human induction. Thus, the weaker the case for human induction, the more controversial it was, even between specialist analysts. At the same time, it is encouraging that E-PIE yields a high degree of agreement between specialists for the cases most confidently identified as human induced.

The results correlated well with those produced by Verdon et al. (2019) using their fluid-injection-oriented scheme. This includes cases at both early and late stages of data collection, measures of both dataset quality and support for human induction, and the direction of travel of the latter measure as more evidence became available. This is an important validation result that suggests E-PIE, a highly generalized scheme designed to be applicable to all seismogenic industrial processes, behaves well compared with an advanced, specialist scheme.

5.5 Reconciliation by discussion and moderation

As discussed by Foulger et al. (2022), it would clearly be possible to reduce the spread of results between analysts in controversial cases by group discussion and moderation. In the present paper, we did not attempt reconciliation by discussion to minimize the number of variables explored at one time. Reconciliation through discussion and moderation introduces additional non-objective, group-dynamic factors well known in the social sciences, such as the strength of individuals’ personalities, their oratory skills, and perceived institutional hierarchical status. Nevertheless, we recognize that this process may be expedient in real situations and exploring it will be the subject of future work.

6 Future work

Future work may include the following:

  1. 1.

    Apply E-PIE to all cases in HiQuake and add the results to the database. This work is currently underway.

  2. 2.

    Investigate and develop best practice for reconciliation of results through discussion and moderation.

  3. 3.

    Test the power of E-PIE to recognize hitherto unsuspected cases of potentially induced earthquakes.

  4. 4.

    Translate E-PIE to other languages and expand HiQuake and E-PIE grading to cases not currently accessible due to language issues.

7 Summary

The main results of our work may be summarized:

  1. 1.

    To create a tool suitable to assess all proposals of human-induced origins, we developed a fully generalized questionnaire scheme—E-PIE—suitable for estimating (a) the completeness of the available data and (b) the strength of evidence for a natural- or human-induced origin.

  2. 2.

    We designed a visual method for displaying the results that avoids implying unrealistic precision—a colored pie-chart without annotation.

  3. 3.

    We describe E-PIE in detail and illustrate its step-by-step application to three example cases.

  4. 4.

    Five analysts independently applied E-PIE to a suite of 23 cases from the HiQuake database of proposed human-induced earthquakes.

  5. 5.

    Spatial correlations of earthquakes with the region of environmental modulation by the industrial activity, and temporal correlations with industrial operations, are the most diagnostically powerful indicators of human induction.

  6. 6.

    E-PIE yielded good repeatability between independent analysts for cases with a high confidence of a human-induced origin. There was systematically less agreement between analysts for low-confidence cases.

  7. 7.

    Results from E-PIE compared well with those of Verdon et al. (2019), using their fluid-injection specialist scheme.