Large-Scale Multi-granular Concept Extraction Based on Machine Reading Comprehension

Yuan, Siyu; Yang, Deqing; Liang, Jiaqing; Sun, Jilun; Huang, Jingyue; Cao, Kaiyan; Xiao, Yanghua; Xie, Rui

doi:10.1007/978-3-030-88361-4_6

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12922))

Included in the following conference series:

International Semantic Web Conference

3451 Accesses
1 Citations
1 Altmetric

Abstract

The concepts in knowledge graphs (KGs) enable machines to understand natural language, and thus play an indispensable role in many applications. However, existing KGs have the poor coverage of concepts, especially fine-grained concepts. In order to supply existing KGs with more fine-grained and new concepts, we propose a novel concept extraction framework, namely MRC-CE, to extract large-scale multi-granular concepts from the descriptive texts of entities. Specifically, MRC-CE is built with a machine reading comprehension model based on BERT, which can extract more fine-grained concepts with a pointer network. Furthermore, a random forest and rule-based pruning are also adopted to enhance MRC-CE’s precision and recall simultaneously. Our experiments evaluated upon multilingual KGs, i.e., English Probase and Chinese CN-DBpedia, justify MRC-CE’s superiority over the state-of-the-art extraction models in KG completion. Particularly, after running MRC-CE for each entity in CN-DBpedia, more than 7,053,900 new concepts (instanceOf relations) are supplied into the KG. The code and datasets have been released at https://github.com/fcihraeipnusnacwh/MRC-CE.

This work is supported by Science and Technology on Information Systems Engineering Laboratory at the 28th Research Institute of China Electronics Technology Group Corporation, Nanjing Jiangsu, China (No. 05202002), National Key Research and Development Project (No. 2020AAA0109302), Shanghai Science and Technology Innovation Action Plan (No. 19511120400) and Shanghai Municipal Science and Technology Major Project (No. 2021SHZDZX0103).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.meituan.com.
2.
http://kw.fudan.edu.cn/cndbpedia.
3.
https://www.microsoft.com/en-us/research/project/probase/.
4.
https://www.wikipedia.org/.
5.
We translate Chinese patterns for CN-DBpedia into English.
6.
Prince Station’s abstract text and CE results were translated from Chinese.

References

Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S., Vollgraf, R.: FLAIR: an easy-to-use framework for state-of-the-art NLP. In: Proceedings of NAACL, pp. 54–59 (2019)
Google Scholar
Alomari, S., Abdullah, S.: Improving an AI-based algorithm to automatically generate concept maps. Comput. Inf. Sci. 12(4), 72 (2019)
Google Scholar
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Chapter Google Scholar
Bai, H., Xing, F.Z., Cambria, E., Huang, W.B.: Business taxonomy construction using concept-level hierarchical clustering. arXiv preprint arXiv:1906.09694 (2019)
Budin, G.: Ontology-driven translation management. In: Knowledge Systems and Translation (2005)
Google Scholar
Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019)
Cui, W., Xiao, Y., Wang, W.: KBQA: an online template based question answering system over freebase. In: Proceedings of IJCAI (2016)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Ji, J., Chen, B., Jiang, H.: Fully-connected LSTM–CRF on medical concept extraction. Int. J. Mach. Learn. Cybern. 11(9), 1971–1979 (2020). https://doi.org/10.1007/s13042-020-01087-6
Article Google Scholar
Kingma, J., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of ICLR (2015)
Google Scholar
Lample, G., Conneau, A.: Crosslingual language model pretraining. In: Proceedings of NeurIPS (2019)
Google Scholar
Li, N., Tian, M., Lv, S.: Extracting hierarchical relations between the back-of-the-book index terms. In: Hong, J.-F., Zhang, Y., Liu, P. (eds.) CLSW 2019. LNCS (LNAI), vol. 11831, pp. 433–443. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38189-9_45
Chapter Google Scholar
Li, X., Feng, J., Meng, Y., Han, Q., Wu, F., Li, J.: A unified MRC framework for named entity recognition. In: Proceedings of ACL (2020)
Google Scholar
Li, X., et al.: Entity-relation extraction as multi-turn question answering. arXiv preprint arXiv:1905.05529 (2019)
Liang, J., Xiao, Y., Wang, H., Zhang, Y., Wang, W.: Probase+: inferring missing links in conceptual taxonomies. IEEE Trans. Knowl. Data Eng. 29(6), 1281–1295 (2017)
Article Google Scholar
Liang, J., Xiao, Y., Wang, H., Zhang, Y., Wang, W.: Probase+: inferring missing links in conceptual taxonomies. IEEE TKDE 29(6), 1281–1295 (2017)
Google Scholar
Liang, J., Zhang, Y., Xiao, Y., Wang, H., Wang, W., Zhu, P.: On the transitivity of hypernym-hyponym relations in data-driven lexical taxonomies. In: Proceedings of AAAI, vol. 31 (2017)
Google Scholar
Liao, J., Sun, F., Gu, J.: Combining concept graph with improved neural networks for Chinese short text classification. In: Wang, X., Lisi, F.A., Xiao, G., Botoeva, E. (eds.) JIST 2019. CCIS, vol. 1157, pp. 205–212. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-3412-6_20
Chapter Google Scholar
Liu, S., Zhang, X., Zhang, S., Wang, H., Zhang, W.: Neural machine reading comprehension: methods and trends. Appl. Sci. 9(18), 3698 (2019)
Article Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Nguyen, A.D., Nguyen, K.H., Ngo, V.V.: Neural sequence labeling for Vietnamese POS tagging and NER. In: Proceedings of IEEE-RIVF, pp. 1–5. IEEE (2019)
Google Scholar
Nie, Y., Tian, Y., Song, Y., Ao, X., Wan, X.: Improving named entity recognition with attentive ensemble of syntactic information. arXiv preprint arXiv:2010.15466 (2020)
Petrucci, G., Rospocher, M., Ghidini, C.: Expressive ontology learning as neural machine translation. JWS 52, 66–82 (2018)
Article Google Scholar
Ponzetto, S.P., Strube, M.: WikiTaxonomy: a large scale knowledge resource. In: Proceedings of ECAI, vol. 178, pp. 751–752. Citeseer (2008)
Google Scholar
Poria, S., Hussain, A., Cambria, E.: EmoSenticSpace: dense concept-based affective features with common-sense knowledge. In: Multimodal Sentiment Analysis. SC, vol. 8, pp. 85–116. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-95020-4_5
Chapter Google Scholar
Preum, S.M., Shu, S., Alemzadeh, H., Stankovic, J.A.: EMSContExt: EMS protocol-driven concept extraction for cognitive assistance in emergency response. In: Proceedings of AAAI, pp. 13350–13355 (2020)
Google Scholar
Qiu, J., Chai, Y., Tian, Z., Du, X., Guizani, M.: Automatic concept extraction based on semantic graphs from big data in smart city. IEEE Trans. Comput. Soc. Syst. 7(1), 225–233 (2019)
Article Google Scholar
Roller, S., Kiela, D., Nickel, M.: Hearst patterns revisited: automatic hypernym detection from large text corpora. In: Proceedings of ACL (2018)
Google Scholar
Ruan, D.R., He, X.Y., Li, D.Y., Gao, K.: Modeling and extracting hyponymy relationships on Chinese electric power field content. In: 2016 8th International Conference on Modelling, Identification and Control (ICMIC), pp. 439–443. IEEE (2016)
Google Scholar
Sammut, C., Webb, G.I. (eds.): Encyclopedia of Machine Learning and Data Mining. Springer, Boston (2017). https://doi.org/10.1007/978-1-4899-7687-1
Book MATH Google Scholar
Seo, M., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603 (2016)
Sharma, R., Gopalani, D., Meena, Y.: Concept-based approach for research paper recommendation. In: Shankar, B.U., Ghosh, K., Mandal, D.P., Ray, S.S., Zhang, D., Pal, S.K. (eds.) PReMI 2017. LNCS, vol. 10597, pp. 687–692. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69900-4_87
Chapter Google Scholar
Shen, Y., Huang, P.S., Gao, J., Chen, W.: ReasoNet: learning to stop reading in machine comprehension. In: Proceedings of ACM SIGKDD, pp. 1047–1055 (2017)
Google Scholar
Shvets, A., Wanner, L.: Concept extraction using pointer-generator networks. arXiv preprint arXiv:2008.11295 (2020)
Song, Y., Tian, S., Yu, L.: A method for identifying local drug names in Xinjiang based on BERT-BiLSTM-CRF. Autom. Control Comput. Sci. 54(3), 179–190 (2020). https://doi.org/10.3103/S0146411620030098
Article Google Scholar
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: Proceedings of WWW (2007)
Google Scholar
Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: Proceeedings of NIPS, pp. 2692–2700 (2015)
Google Scholar
Wei, Z., Su, J., Wang, Y., Tian, Y., Chang, Y.: A novel hierarchical binary tagging framework for joint extraction of entities and relations. arXiv preprint arXiv:1909.03227 (2019)
Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: Proceedings of ACM SIGMOD, pp. 481–492 (2012)
Google Scholar
Xu, B., et al.: METIC: multi-instance entity typing from corpus. In: Proceedings of CIKM, pp. 903–912 (2018)
Google Scholar
Xu, B., et al.: CN-DBpedia: a never-ending Chinese knowledge extraction system. In: Benferhat, S., Tabia, K., Ali, M. (eds.) IEA/AIE 2017. LNCS (LNAI), vol. 10351, pp. 428–438. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60045-1_44
Chapter Google Scholar
Xu, B., Zhang, Y., Liang, J., Xiao, Y., Hwang, S., Wang, W.: Cross-lingual type inference. In: Navathe, S.B., Wu, W., Shekhar, S., Du, X., Wang, X.S., Xiong, H. (eds.) DASFAA 2016. LNCS, vol. 9642, pp. 447–462. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32025-0_28
Chapter Google Scholar
Yang, Y., Shen, X., Wang, Y.: BERT-BiLSTM-CRF for Chinese sensitive vocabulary recognition. In: Li, K., Li, W., Wang, H., Liu, Y. (eds.) ISICA 2019. CCIS, vol. 1205, pp. 257–268. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-5577-0_19
Chapter Google Scholar
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Proceedings of NIPS, pp. 5753–5763 (2019)
Google Scholar
Yao, J., Cui, B., Cong, G., Huang, Y.: Evolutionary taxonomy construction from dynamic tag space. WWW 15(5–6), 581–602 (2012). https://doi.org/10.1007/s11280-011-0150-4
Article Google Scholar
Yilahun, H., Abdurahman, K., Imam, S., Hamdulla, A.: Automatic extraction of Uyghur domain concepts based on multi-feature for ontology extension. IET Netw. 9(4), 200–205 (2020)
Article Google Scholar
Zhao, G., Zhang, X.: Domain-specific ontology concept extraction and hierarchy extension. In: Proceedings of NLPIR, pp. 60–64 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Data Science, Fudan University, Shanghai, China
Siyu Yuan, Deqing Yang, Jingyue Huang & Kaiyan Cao
School of Computer Science, Fudan University, Shanghai, China
Jiaqing Liang, Jilun Sun & Yanghua Xiao
Fudan-Aishu Cognitive Intelligence Joint Research Center, Shanghai, China
Yanghua Xiao
Meituan, Beijing, China
Rui Xie

Authors

Siyu Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Deqing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jiaqing Liang
View author publications
You can also search for this author in PubMed Google Scholar
Jilun Sun
View author publications
You can also search for this author in PubMed Google Scholar
Jingyue Huang
View author publications
You can also search for this author in PubMed Google Scholar
Kaiyan Cao
View author publications
You can also search for this author in PubMed Google Scholar
Yanghua Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Rui Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Deqing Yang or Yanghua Xiao .

Editor information

Editors and Affiliations

University of Würzburg, Würzburg, Germany
Andreas Hotho
Linköping University, Linköping, Sweden
Eva Blomqvist
University of Düsseldorf, Düsseldorf, Germany
Stefan Dietze
IBM Research - Thomas J. Watson Research, Hawthorne, CA, USA
Achille Fokoue
University of Texas, Austin, TX, USA
Ying Ding
Imperial College, London, UK
Payam Barnaghi
Australian National University, Canberra, ACT, Australia
Armin Haller
Fondazione Bruno Kessler, Povo, Trento, Italy
Mauro Dragoni
The Open University Walton Hall, Milton Keynes, UK
Harith Alani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yuan, S. et al. (2021). Large-Scale Multi-granular Concept Extraction Based on Machine Reading Comprehension. In: Hotho, A., et al. The Semantic Web – ISWC 2021. ISWC 2021. Lecture Notes in Computer Science(), vol 12922. Springer, Cham. https://doi.org/10.1007/978-3-030-88361-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-88361-4_6
Published: 30 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88360-7
Online ISBN: 978-3-030-88361-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the Semantic Web Science Association (opens in a new tab)