Skip to main content

A preliminary text classification of the precursory accelerating seismicity corpus: inference on some theoretical trends in earthquake predictability research from 1988 to 2018

Abstract

Text analytics based on supervised machine learning has shown great promise in a multitude of domains but has yet to be applied to seismology. We describe some common classifiers (Naïve Bayes, k-Nearest Neighbors, Support Vector Machines, and Random Forests) as well as the standard steps of supervised learning (training, validation of model parameter adjustments, and testing). To illustrate text classification on a seismological corpus, we use a hundred articles related to the topic of precursory accelerating seismicity, spanning from 1988 to 2010. This corpus was labelled by Mignan [Tectonophysics, 2011] with the precursor whether explained by critical processes (i.e., cascade triggering) or by other processes (such as signature of main fault loading). We investigate how the classification process can be automatized to help analyze larger corpora in order to better understand trends in earthquake predictability research. We find that the Naïve Bayes model performs best, in agreement with the machine learning literature for the case of small datasets, with cross-validation accuracies showing the model’s predictive ability for both binary classification (“critical process” or else) and a multiclass classification (“non-critical process,” “agnostic,” “critical process assumed,” “critical process demonstrated”). Prediction on a dozen of articles published since 2011 shows however a weak generalization, which can be explained, in part, by the empirical variance of the small training set. This preliminary study demonstrates the potential of supervised learning to reveal textual patterns in the seismological literature. Manual labelling remains essential but is made transparent by an investigation of Naïve Bayes keyword posterior probabilities.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

References

  1. Adamaki AK, Roberts RG (2017) Precursory activity before larger events in Greece revealed by aggregated seismicity data. Pure Appl Geophys 174:1331–1343. https://doi.org/10.1007/s00024-017-1465-6

    Article  Google Scholar 

  2. Aggarwal CC (2018) Machine learning for text. Springer Nature, 493 pp. https://doi.org/10.1007/978-3-319-73531-3

  3. Bak P, Tang C (1989) Earthquakes as a self-organized critical phenomenon. J Geophys Res 94:15,635–15,637

    Article  Google Scholar 

  4. Bennet KP, Campbell C (2000) Support vector machines: hype or hallelujah? SIGKDD Explor 2:1–13

    Article  Google Scholar 

  5. Benoit K (2018) Quantitative analysis of textual data, package 'quanteda', available at https://cran.r-project.org/web/packages/quanteda/ (last assessed August 2018)

  6. Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    Google Scholar 

  7. Bouchon M, Marsan D (2015) Reply to 'Artificial seismic acceleration'. Nat Geosci 8:83

    Article  Google Scholar 

  8. Bouchon M, Durand V, Marsan D, Karabulut H, Schmittbuhl J (2013) The long precursory phase of most large interplate earthquakes. Nat Geosci 6:299–302

    Article  Google Scholar 

  9. Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  10. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman & Hall/CRC, Taylor & Francis Group 358 pp

    Google Scholar 

  11. Bufe CG, Varnes DJ (1993) Predictive modeling of the seismic cycle of the greater San Francisco Bay region. J Geophys Res 98:9,871–9,883

    Article  Google Scholar 

  12. Christou EV, Karakaisis G, Scordilis E (2016) Time dependent seismicity along the western coast of Canada. Res Geophys 5:5730

    Article  Google Scholar 

  13. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297

    Google Scholar 

  14. Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory IT-13:21–27

    Article  Google Scholar 

  15. De Santis A, Cianchini G, Di Giovambattista R (2015) Accelerating moment release revisited: examples of application to Italian seismic sequences. Tectonophysics 639:82–98. https://doi.org/10.1016/j.tecto.2014.11.015

    Article  Google Scholar 

  16. Domingos P, Pazzani M (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach Learn 29:103–130

    Article  Google Scholar 

  17. Felzer KR, Page MT, Michael AJ (2015) Artificial seismic acceleration. Nat Geosci 8:82–83

    Article  Google Scholar 

  18. Forman G (2008) BNS feature scaling: an improved representation over TF-IDF for SVM text classification, ACM 17th Conf. Info. and Knowl. Management 263-270

  19. Freund Y, Schapire RE (1999) A short introduction to boosting. J Japanese Soc AI 14:771–780

    Google Scholar 

  20. Geller RJ (1997) Earthquake prediction: a critical review. Geophys J Int 131:425–450

    Article  Google Scholar 

  21. Glez-Peña D, Laurenco A, Lopez-Fernandez H, Reboiro-Jato M, Fdez-Riverola F (2013) Web scraping technologies in an API world. Brief Bioinform 15:788–797

    Article  Google Scholar 

  22. Grimmer J, Stewart BM (2013) Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit Anal 21:267–297. https://doi.org/10.1093/pan/mps028

    Article  Google Scholar 

  23. Grün B, Hornik K (2017). Topic models, package 'topicmodels', available at https://cran.r-project.org/web/packages/topicmodels/ (last assessed August 2018)

  24. Guilhem A, Bürgmann R, Freed AM, Tabrez Ali S (2013) Testing the accelerating moment release (AMR) hypothesis in areas of high stress. Geophys J Int 195:785–798. https://doi.org/10.1093/gji/ggt298

    Article  Google Scholar 

  25. Hardebeck JL, Felzer KR, Michael AJ (2008) Improved tests reveal that the accelerating moment release hypothesis is statistically insignificant. J Geophys Res 113:B08310. https://doi.org/10.1029/2007JB005410

    Article  Google Scholar 

  26. Hechenbichler, K., and K. P. Schliep (2004). Weighted k-nearest-neighbor techniques and ordinal classification. Discussion paper 399, SFB 386, Ludwig-Maximilians University, Munich

  27. Hough S (2010) Predicting the unpredictable: the tumultuous science of earthquake prediction. Princeton University Press 272 pp

  28. Huang H, Meng L (2018) Slow unlocking processes preceding the 2015 Mw 8.4 Illapel, Chile, earthquake. Geophys Res Lett 45:3914–3922. https://doi.org/10.1029/2018GL077060

    Article  Google Scholar 

  29. Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recogn Lett 31:651–666. https://doi.org/10.1016/j.patrec.2009.09.011

    Article  Google Scholar 

  30. Jiang C, Wu Z (2012) Insights into the long-to-intermediate-term pre-shock accelerating moment release (AMR) from the March 11, 2011, off the Pacific coast of Tohoku, Japan, M9 earthquake. Earth Planets Space 64:765–769

    Article  Google Scholar 

  31. Jiang C, Wu Z (2013) Intermediate-term medium-range precursory accelerating seismicity prior to the 12 May 2008, Wenchuan earthquake. Pure Appl Geophys 170:209–219. https://doi.org/10.1007/s00024-011-0413-0

    Article  Google Scholar 

  32. Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. Mach Learn ECML-98:137–142

    Google Scholar 

  33. Karakaisis GF, Parazachos CB, Scordilis EM (2013) Recent reliable observations and improved tests on synthetic catalogs with spatiotemporal clustering verify precursory decelerating-accelerating seismicity. J Seismol 17:1063–1072. https://doi.org/10.1007/s10950-013-9372-5

    Article  Google Scholar 

  34. Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) kernlab—an S4 package for Kernelt methods in R. J Stat Softw 11:1–20

    Article  Google Scholar 

  35. Kazemian J, Hatami MR (2017) Temporal variations of seismic parameters in Tehran region. Pure Appl Geophys 174:3841–3852. https://doi.org/10.1007/s00024-017-1549-3

    Article  Google Scholar 

  36. Kharde VA, Sonawane SS (2016) Sentiment analysis of Twitter data: a survey of techniques. Int J Comput Appl 139:5–15

    Google Scholar 

  37. King GCP (1983) The accommodation of large strains in the upper lithosphere of the earth and other solids by self-similar fault systems: the geometrical origin of b-value. Pure Appl Geophys 121:761–815

    Article  Google Scholar 

  38. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI'95 Proceed 14th Int Joint Conf AI 2:1137–1143

    Google Scholar 

  39. Kuhn T (1970) The structure of scientific revolutions, enlarged. In: International encyclopedia of unified science, 2nd edn. The University of Chicago Press 210 pp

  40. Lagios E, Papadimitriou P, Novali F, Sakkas V, Fumagalli A, Vlachou K, Del Conte S (2012) Combined seismicity pattern analysis, DGPS and PSInSAR studies in the broader area of Cephalonia (Greece). Tectonophysics 524-525:43–58. https://doi.org/10.1016/j.tecto.2011.12.015

    Article  Google Scholar 

  41. Liaw A, Wiener M (2018). Breiman and Cutler's random forests for classification and regression, package 'randomForest', available at https://cran.r-project.org/web/packages/randomForest/ (last assessed August 2018)

  42. Mignan A (2011) Retrospective on the accelerating seismic release (ASR) hypothesis: controversy and new horizons. Tectonophysics 505:1–16. https://doi.org/10.1016/j.tecto.2011.03.010

    Article  Google Scholar 

  43. Mignan A (2012) Seismicity precursors to large earthquakes unified in a stress accumulation framework. Geophys Res Lett 39:L21308. https://doi.org/10.1029/2012GL053946

  44. Mignan A (2014) The debate on the prognostic value of earthquake foreshocks: a meta-analysis. Sci Rep 4:4099. https://doi.org/10.1038/srep04099

    Article  Google Scholar 

  45. Mignan A (2015) Modeling aftershocks as a stretched exponential relaxation. Geophys Res Lett 42:9726–9732. https://doi.org/10.1002/2015GL066232

    Article  Google Scholar 

  46. Mignan A, King GCP, Bowman D (2007) A mathematical formulation of accelerating moment release based on the stress accumulation model. J Geophys Res 112:B07308. https://doi.org/10.1029/2006JB004671

    Article  Google Scholar 

  47. Mouselimis L (2018). Kernel k nearest neighbors, package 'KernelKnn', available at https://cran.r-project.org/web/packages/KernelKnn/ (last assessed August 2018)

  48. Ng AY, Jordan MI (2001) On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. Adv Neural Inf Proces Syst 14:605–610

    Google Scholar 

  49. Ng S-K, Wong M (1999) Toward routine automatic pathway discovery from on-line scientific text abstracts. Genome Inform 10:104–112

    Google Scholar 

  50. Ogata Y (1988) Statistical models for earthquake occurrences and residual analysis for point processes. J Am Stat Assoc 83:9–27

    Article  Google Scholar 

  51. Papadopoulos GA (1988) Long-term accelerating foreshock activity may indicate the occurrence time of a strong shock in the Western Hellenic Arc. Tectonophysics 152:179–192

    Article  Google Scholar 

  52. Papazachos BC, Karakaisis GF, Papazachos CB, Scordilis EM (2007) Evaluation of the results for an intermediate-term prediction of the 8 January 2006 Mw 6.9 Cythera earthquake in the southwestern Aegean. Bull Seismol Soc Am 97:347–352. https://doi.org/10.1785/0120060075

    Article  Google Scholar 

  53. Pearce D, Rantala V (1983) New foundations for metascience. Synthese 56:1–26

    Google Scholar 

  54. Pliakis D, Papakostas T, Vallianatos F (2012) A first principles approach to understand the physics of precursory accelerating seismicity. Ann Geophys 55:165–170. https://doi.org/10.4401/ag-5363

    Google Scholar 

  55. Rokach L (2010) Ensemble-based classifiers. Artif Intell Rev 33:1–39. https://doi.org/10.1007/s10462-009-9124-7

  56. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagation errors. Nature 323:533–536

    Article  Google Scholar 

  57. Salton G, McGill M (eds) (1983) Introduction to modern information retrieval. McGraw-Hill

  58. Sammis CG, Sornette D (2002) Positive feedback, memory, and the predictability of earthquakes. PNAS 99:2501–2508. https://doi.org/10.1073/pnas.012580999

    Article  Google Scholar 

  59. Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34:1–47

    Article  Google Scholar 

  60. Seif S, Mignan A, Zechar JD, Werner MJ, Wiemer S (2017) Estimating ETAS: the effects of truncation, missing data, and model assumptions. J Geophys Res Solid Earth 122:449–469. https://doi.org/10.1002/2016JB012809

    Article  Google Scholar 

  61. Seif S, Zechar JD, Mignan A, Nandan S, Wiemer S (2018) Foreshocks and their potential deviation from general seismicity. Bull Seismol Soc Am 109:1–18. https://doi.org/10.1785/0120170188

    Article  Google Scholar 

  62. Sornette D (2000) Critical phenomena in natural sciences, chaos, fractal, self-organization and disorder: concepts and tools. Springer 434 pp

  63. Steinwart I, Christmann A (2008) Support vector machines, information science and statistics. Springer 601 pp

  64. Tsytsarau M, Palpanas T (2012) Survey on mining subjective data on the web. Data Lin Knowl Disc 24:478–514. https://doi.org/10.1007/s10618-011-0238-6

    Article  Google Scholar 

  65. Welbers K, Van Atteveldt W, Benoit K (2017) Text analysis in R. Commun Methods Meas 11:245–265. https://doi.org/10.1080/19312458.2017.1387238

    Article  Google Scholar 

Download references

Acknowledgments

I thank Pablo Nieto and Marco Broccardo for discussions on the topic of text classification, as well as reviewer Riccardo Zaccarelli for his valuable comments.

Data and resources

All the corpus articles are available on journal websites. The corpus meta-data and labelling are provided in the supplementary material to this article.

Author information

Affiliations

Authors

Corresponding author

Correspondence to A. Mignan.

Electronic supplementary material

ESM 1

(DOCX 24 kb)

ESM 2

(JSON 171 kb)

ESM 3

(JSON 5 kb)

ESM 4

(JSON 5 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mignan, A. A preliminary text classification of the precursory accelerating seismicity corpus: inference on some theoretical trends in earthquake predictability research from 1988 to 2018. J Seismol 23, 771–785 (2019). https://doi.org/10.1007/s10950-019-09833-2

Download citation

Keywords

  • Machine learning
  • Supervised learning
  • Earthquake precursor
  • Critical phenomena