Abstract
Reports on mineral exploration provide several insights regarding the geological settings of mineral deposits. However, because the reports are presented as unstructured text, it can be difficult for geologists to extract meaningful geological information without manually reading through a sizable number of reports. In this work, geological data from such underutilized exploration reports relevant to mineralization and ore-forming conditions were automatically retrieved. It is shown that a knowledge graph (KG) of ore-forming circumstances can harmonize heterogeneous data to enable effective and efficient data-driven discovery. This is accomplished by creating KGs that define geological entities and their relationships in exploration reports. Based on the sequence features extracted by the bidirectional long short-term memory network, the syntactic structural information from the graph convolutional neural network coding dependency analysis results are used to construct an end-to-end entity relationship joint extraction model by using the improved entity annotation strategy. In this research, six dominant entities and 24 relation types are considered. The generation of high-quality KGs from geological reports of iron ore deposits illustrates the effectiveness of our methodology. The results indicate that the structured information contained in the KGs accurately represents the contents of the source reports and corresponds to domain knowledge. The suggested approaches are capable of quickly and reliably converting text data into structured form, and indicate that KG procedures can help in the knowledge discovery of the metallogenesis and spatiotemporal evolution in mineral exploration. Our study tackles the scarcity of machine-readable KGs for ore-forming conditions and will aid in the integration of geological data from diverse sources in data-intensive research.
Similar content being viewed by others
Data and Code Availability
References
Abbasi S, Pourmorad S, Mohanty A (2021) Investigation of petrographic and diagenetic properties of Asmari reservoir cap rock. SW Iran. J Human Earth Future 2(3):248–257. https://doi.org/10.28991/hef-2021-02-03-06
Budi I, Bressan S (2003) Association rules mining for name entity recognition. In: Proceedings of the fourth international conference on web information systems engineering, 2003. WISE 2003. IEEE, pp 325–328. https://doi.org/10.1109/wise.2003.1254504
Chen G, Cheng Q (2016) Singularity analysis based on wavelet transform of fractal measures for identifying geochemical anomaly in mineral exploration. Comput Geosci 87:56–66. https://doi.org/10.1016/j.cageo.2015.11.007
Chen G, Cheng Q (2018) Fractal-based wavelet filter for separating geophysical or geochemical anomalies from background. Math Geosci 50(3):249–272. https://doi.org/10.1007/s11004-017-9707-9
Chen J-P, Hou C-B, Wang G-W, Lv P, Zhu P-F, Zeng M, Wu W (2005) Research on text data mining in quantitative evaluation of mineral resources. Comput Tech Geophys Geochem Explor 03:263–266
Chen G, Huang N, Wu G, Luo L, Wang D, Cheng Q (2022) Mineral prospectivity mapping based on wavelet neural network and Monte Carlo simulations in the Nanling W-Sn metallogenic province. Ore Geol Rev. https://doi.org/10.1016/j.oregeorev.2022.104765
Culotta A, Sorensen J (2004) Dependency tree kernels for relation extraction. In: Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL-04), pp 423–429. https://doi.org/10.3115/1218955.1219009
Devlin J, Chang M W, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding[J]. http://arxiv.org/abs/1810.04805
Enkhsaikhan M, Liu W, Holden EJ, Duuring P (2018) Towards geological knowledge discovery using vector-based semantic similarity. In: International conference on advanced data mining and applications. Springer, Cham, pp 224–237. https://doi.org/10.1007/978-3-030-05090-0_20
Enkhsaikhan M, Liu W, Holden EJ, Duuring P (2021a) Auto-labelling entities in low-resource text: a geological case study. Knowl Inf Syst 63(3):695–715. https://doi.org/10.1007/s10115-020-01532-6
Enkhsaikhan M, Holden EJ, Duuring P, Liu W (2021b) Understanding ore-forming conditions using machine reading of text. Ore Geol Rev. https://doi.org/10.1016/j.oregeorev.2021b.104200
Fensel D, Şimşek U, Angele K, Huaman E, Kärle E, Panasiuk O, Toma I, Umbrich J, Fensel D, Şimşek U, Wahler A (2020) Introduction: what is a knowledge graph? In: Knowledge graphs. Springer, Cham, pp 1–10. https://doi.org/10.1007/978-3-030-37439-6_1
Goyal A, Gupta V, Kumar M (2018) Recent named entity recognition and classification techniques: a systematic review. Comput Sci Rev 29:21–43. https://doi.org/10.1016/j.cosrev.2018.06.001
Graves A, Mohamed AR, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, pp 6645–6649. https://doi.org/10.1109/icassp.2013.6638947
Gupta P, Roth B, Schütze H (2018) Joint bootstrapping machines for high confidence relation extraction. https://doi.org/10.18653/v1/n18-1003
Holden EJ, Liu W, Horrocks T, Wang R, Wedge D, Duuring P, Beardsmore T (2019) GeoDocA—fast analysis of geological content in mineral exploration reports: a text mining approach. Ore Geol Rev 111:102919. https://doi.org/10.1016/j.oregeorev.2019.05.005
Huang W, Mao Y, Yang L, Yang Z, Long J (2021) Local-to-global GCN with knowledge-aware representation for distantly supervised relation extraction. Knowl-Based Syst 234:107565. https://doi.org/10.1016/j.knosys.2021.107565
Jiang J, Zhai C (2007) A systematic exploration of the feature space for relation extraction. In: Human language technologies 2007: the conference of the North American chapter of the association for computational linguistics; Proceedings of the main conference, pp 113–120
Liu P, Guo Y, Wang F, Li G (2022) Chinese named entity recognition: the state of the art. Neurocomputing 473:37–53. https://doi.org/10.1016/j.neucom.2021.10.101
Lu R, Cai Z, Zhao S (2019) A survey of knowledge reasoning based on kg. In: IOP conference series: materials science and engineering, vol. 569. IOP Publishing, p 052058. https://doi.org/10.1088/1757-899x/569/5/052058
Lyros E, Kostelecky J, Plicka V, Vratislav F, Sokos E, Nikolakopoulos K (2021) Detection of tectonic and crustal deformation using GNSS data processing: the case of ppgnet. Civ Eng J 7(1):14–23. https://doi.org/10.28991/cej-2021-03091633
Ma X, Ma C, Wang C (2020) A new structure for representing and tracking version information in a deep time knowledge graph. Comput Geosci 145:104620. https://doi.org/10.1016/j.cageo.2020.104620
Ma K, Tian M, Tan Y, Xie X, Qiu Q (2021) What is this article about? Generative summarization with the BERT model in the geosciences domain. Earth Sci Inform (15–1). https://doi.org/10.1007/s12145-021-00695-2
Ma K, Tan Y, Xie Z, Qiu Q, Chen S (2022a) Chinese toponym recognition with variant neural structures from social media messages based on BERT methods. J Geogr Syst. https://doi.org/10.1007/s10109-022-00375-9
Ma Y, Xie Z, Li G, Ma K, Huang Z, Qiu Q, Liu H (2022b) Text visualization for geological hazard documents via text mining and natural language processing. Earth Sci Inform. https://doi.org/10.1007/s12145-021-00732-0
Mehmood Q, Qing W, Chen J, Yan J, Ammar M, Rahman G (2021) Susceptibility assessment of single gully debris flow based on AHP and extension method. Civ Eng J. https://doi.org/10.28991/cej-2021-03091702
Park C, Park J, Park S (2020) AGCN: attention-based graph convolutional networks for drug-drug interaction extraction. Expert Syst Appl 159:113538. https://doi.org/10.1016/j.eswa.2020.113538
Paulheim H (2017) Knowledge graph refinement: a survey of approaches and evaluation methods. Semant Web 8(3):489–508. https://doi.org/10.3233/sw-160218
Qin Z, Ye F (2019) Research on reliability of instance and pattern in semi-supervised entity relation extraction. In: Recent developments in intelligent computing, communication and devices. Springer, Singapore, pp 377–385. https://doi.org/10.1007/978-981-10-8944-2_44
Qiu Q, Xie Z, Wu L, Wu L (2018) DGeoSegmenter: a dictionary-based Chinese word segmenter for the geoscience domain. Comput Geosci 121:1–11. https://doi.org/10.1016/j.cageo.2018.08.006
Qiu Q, Xie Z, Wu L (2019a) BiLSTM-CRF for geological named entity recognition from the geoscience literature. Earth Sci Inf 12(4):565–579. https://doi.org/10.1007/s12145-019-00390-3
Qiu Q, Xie Z, Wu L (2019b) GNER: A generative model for geological named entity recognition without labeled data using deep learning. Earth Space Sci 6(6):931–946. https://doi.org/10.1029/2019ea000610
Qiu Q, Xie Z, Wu L, Li W (2019c) Geoscience keyphrase extraction algorithm using enhanced word embedding. Expert Syst Appl 125:157–169. https://doi.org/10.1016/j.eswa.2019.02.001
Qiu Q, Xie Z, Wu L et al (2020a) Dictionary-based automated information extraction from geological documents using a deep learning algorithm. Earth Space Sci 7(3):e2019EA000993. https://doi.org/10.1029/2019ea000993
Qiu Q, Xie Z, Wu L et al (2020b) Automatic spatiotemporal and semantic information extraction from unstructured geoscience reports using text mining techniques. Earth Sci Inf 13(4):1393–1410. https://doi.org/10.1007/s12145-020-00527-9
Qiu Q, Xie Z, Xie H, Wang B (2021a) GKEEP: an enhanced graph-based keyword extractor with error-feedback propagation for geoscience reports. Earth Space Sci 8(5):e2020EA001602. https://doi.org/10.1029/2020ea001602
Qiu Q, Xie Z, Ma K, Chen Z, Tao L (2022) Spatially oriented convolutional neural network for spatial relation extraction from natural language texts. Trans GIS. https://doi.org/10.1111/tgis.12887
Saha SK, Narayan S, Sarkar S, Mitra P (2010) A composite kernel for named entity recognition. Pattern Recogn Lett 31(12):1591–1597. https://doi.org/10.1016/j.patrec.2010.05.004
Sobhana NV, Ghosh SK, Mitra P (2012) Entity relation extraction from geological text using conditional random fields and subsequence kernels. In: 2012 annual IEEE India conference (INDICON). IEEE, pp 832–840. https://doi.org/10.1109/indcon.2012.6420733
Song D, Xu J, Pang J, Huang H (2021) Classifier-adaptation knowledge distillation framework for relation extraction and event detection with imbalanced data. Inf Sci 573:222–238. https://doi.org/10.1016/j.ins.2021.05.045
Wan Q, Wei L, Chen X, Liu J (2021) A region-based hypergraph network for joint entity-relation extraction. Knowl-Based Syst 228:107298. https://doi.org/10.1016/j.knosys.2021.107298
Wang B, Wu L, Li W, Qiu QJ, Xie Z, Liu H (2021a) A semi-automatic approach for generating geological profiles by integrating multi-source data. Ore Geol Rev. https://doi.org/10.1016/j.oregeorev.2021.104190
Wang C, Hazen RM, Cheng Q, Stephenson MH, Zhou C, Fox P, Shen SZ, Oberhänsli R, Hou Z, Ma X, Feng Z, Schiffries CM (2021b) The deep-time digital earth program: data-driven discovery in geosciences. Natl Sci Rev 8(9):nwab027. https://doi.org/10.1130/abs/2021am-369211
Wang B, Wu L, Xie Z, Qiu Q, Zhou Y, Ma K, Tao L (2022) Understanding geological reports based on knowledge graphs using a deep learning approach. Comput Geosci 168:105229. https://doi.org/10.1016/j.cageo.2022.105229
Xu S, Sun S, Zhang Z, Xu F, Liu J (2021) BERT gated multi-window attention network for relation extraction. Neurocomputing. https://doi.org/10.1016/j.neucom.2021.12.044
Yu L, Feng L, Xiliang L (2016) A bootstrapping based approach for open geo-entity relationship extraction [J]. Acta Geod Cartogr Sin 45(5):616–622
Zhang C, Zhang X, Jiang W, Shen Q, Zhang S (2009) Rule-based extraction of spatial relations in natural language text. In: 2009 International conference on computational intelligence and software engineering. IEEE, pp 1–4. https://doi.org/10.1109/cise.2009.5363900
Zhang X, Gao Z, Zhu M (2011) Kernel methods and its application in relation extraction. In: 2011 International conference on computer science and service system (CSSS). IEEE, pp 1362–1365. https://doi.org/10.1109/csss.2011.5972181
Zhang XY, Ye P, Wang S, Du M (2018a) Geological entity recognition method based on deep belief networks. Acta Petrol Sin 34(2):343–351
Zhang Y, Qi P, Manning CD (2018b) Graph convolution over pruned dependency trees improves relation extraction. https://doi.org/10.18653/v1/d18-1244
Zheng S, Wang F, Bao H, Hao Y, Zhou P, Xu B (2017) Joint extraction of entities and relations based on a novel tagging scheme. https://doi.org/10.18653/v1/p17-1113
Zhou G, Qian L, Fan J (2010) Tree kernel-based semantic relation extraction with rich syntactic and semantic information. Inf Sci 180(8):1313–1325. https://doi.org/10.1016/j.ins.2009.12.006
Acknowledgements
This study was supported by the IUGS Deep-time Digital Earth (DDE) Big Science Program. This study was financially supported by the National Natural Science Foundation of China (42050101, U1711267, 41871311, 41871305), the China Postdoctoral Science Foundation (No.2021M702991), and the Open Research Project of the Hubei Key Laboratory of Intelligent Geo-Information Processing (No. KLIGIP-2021A01).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Qiu, Q., Ma, K., Lv, H. et al. Construction and Application of a Knowledge Graph for Iron Deposits Using Text Mining Analytics and a Deep Learning Algorithm. Math Geosci 55, 423–456 (2023). https://doi.org/10.1007/s11004-023-10050-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11004-023-10050-4