Skip to main content
Log in

Ontology-Based BERT Model for Automated Information Extraction from Geological Hazard Reports

  • Published:
Journal of Earth Science Aims and scope Submit manuscript

Abstract

Geological knowledge can provide support for knowledge discovery, knowledge inference and mineralization predictions of geological big data. Entity identification and relationship extraction from geological data description text are the key links for constructing knowledge graphs. Given the lack of publicly annotated datasets in the geology domain, this paper illustrates the construction process of geological entity datasets, defines the types of entities and interconceptual relationships by using the geological entity concept system, and completes the construction of the geological corpus. To address the shortcomings of existing language models (such as Word2vec and Glove) that cannot solve polysemous words and have a poor ability to fuse contexts, we propose a geological named entity recognition and relationship extraction model jointly with Bidirectional Encoder Representation from Transformers (BERT) pretrained language model. To effectively represent the text features, we construct a BERT- bidirectional gated recurrent unit network (BiGRU)-conditional random field (CRF)-based architecture to extract the named entities and the BERT-BiGRU-Attention-based architecture to extract the entity relations. The results show that the F1-score of the BERT-BiGRU-CRF named entity recognition model is 0.91 and the F1-score of the BERT-BiGRU-Attention relationship extraction model is 0.84, which are significant performance improvements when compared to classic language models (e.g., word2vec and Embedding from Language Models (ELMo)).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References Cited

Download references

Acknowledgments

This study was financially supported by the National Key R & D Program of China (No. 2022YFF0711601), the Natural Science Foundation of Hubei Province of China (No. 2022CFB640), the Open Fund of Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources (No. KF-2022-07-014), the Opening Fund of Hubei Key Laboratory of Intelligent Vision-Based Monitoring for Hydroelectric Engineering (No. 2022SDSJ04), and the Beijing Key Laboratory of Urban Spatial Information Engineering (No. 20220108). The final publication is available at Springer via https://doi.org/10.1007/s12583-022-1724-z.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rong Huang.

Ethics declarations

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, K., Tian, M., Tan, Y. et al. Ontology-Based BERT Model for Automated Information Extraction from Geological Hazard Reports. J. Earth Sci. 34, 1390–1405 (2023). https://doi.org/10.1007/s12583-022-1724-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12583-022-1724-z

Key Words

Navigation