Skip to main content

Large-Scale Multi-granular Concept Extraction Based on Machine Reading Comprehension

  • Conference paper
  • First Online:
The Semantic Web – ISWC 2021 (ISWC 2021)

Abstract

The concepts in knowledge graphs (KGs) enable machines to understand natural language, and thus play an indispensable role in many applications. However, existing KGs have the poor coverage of concepts, especially fine-grained concepts. In order to supply existing KGs with more fine-grained and new concepts, we propose a novel concept extraction framework, namely MRC-CE, to extract large-scale multi-granular concepts from the descriptive texts of entities. Specifically, MRC-CE is built with a machine reading comprehension model based on BERT, which can extract more fine-grained concepts with a pointer network. Furthermore, a random forest and rule-based pruning are also adopted to enhance MRC-CE’s precision and recall simultaneously. Our experiments evaluated upon multilingual KGs, i.e., English Probase and Chinese CN-DBpedia, justify MRC-CE’s superiority over the state-of-the-art extraction models in KG completion. Particularly, after running MRC-CE for each entity in CN-DBpedia, more than 7,053,900 new concepts (instanceOf relations) are supplied into the KG. The code and datasets have been released at https://github.com/fcihraeipnusnacwh/MRC-CE.

This work is supported by Science and Technology on Information Systems Engineering Laboratory at the 28th Research Institute of China Electronics Technology Group Corporation, Nanjing Jiangsu, China (No. 05202002), National Key Research and Development Project (No. 2020AAA0109302), Shanghai Science and Technology Innovation Action Plan (No. 19511120400) and Shanghai Municipal Science and Technology Major Project (No. 2021SHZDZX0103).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.meituan.com.

  2. 2.

    http://kw.fudan.edu.cn/cndbpedia.

  3. 3.

    https://www.microsoft.com/en-us/research/project/probase/.

  4. 4.

    https://www.wikipedia.org/.

  5. 5.

    We translate Chinese patterns for CN-DBpedia into English.

  6. 6.

    Prince Station’s abstract text and CE results were translated from Chinese.

References

  1. Akbik, A., Bergmann, T., Blythe, D., Rasul, K., Schweter, S., Vollgraf, R.: FLAIR: an easy-to-use framework for state-of-the-art NLP. In: Proceedings of NAACL, pp. 54–59 (2019)

    Google Scholar 

  2. Alomari, S., Abdullah, S.: Improving an AI-based algorithm to automatically generate concept maps. Comput. Inf. Sci. 12(4), 72 (2019)

    Google Scholar 

  3. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52

    Chapter  Google Scholar 

  4. Bai, H., Xing, F.Z., Cambria, E., Huang, W.B.: Business taxonomy construction using concept-level hierarchical clustering. arXiv preprint arXiv:1906.09694 (2019)

  5. Budin, G.: Ontology-driven translation management. In: Knowledge Systems and Translation (2005)

    Google Scholar 

  6. Conneau, A., et al.: Unsupervised cross-lingual representation learning at scale. arXiv preprint arXiv:1911.02116 (2019)

  7. Cui, W., Xiao, Y., Wang, W.: KBQA: an online template based question answering system over freebase. In: Proceedings of IJCAI (2016)

    Google Scholar 

  8. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  9. Ji, J., Chen, B., Jiang, H.: Fully-connected LSTM–CRF on medical concept extraction. Int. J. Mach. Learn. Cybern. 11(9), 1971–1979 (2020). https://doi.org/10.1007/s13042-020-01087-6

    Article  Google Scholar 

  10. Kingma, J., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of ICLR (2015)

    Google Scholar 

  11. Lample, G., Conneau, A.: Crosslingual language model pretraining. In: Proceedings of NeurIPS (2019)

    Google Scholar 

  12. Li, N., Tian, M., Lv, S.: Extracting hierarchical relations between the back-of-the-book index terms. In: Hong, J.-F., Zhang, Y., Liu, P. (eds.) CLSW 2019. LNCS (LNAI), vol. 11831, pp. 433–443. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38189-9_45

    Chapter  Google Scholar 

  13. Li, X., Feng, J., Meng, Y., Han, Q., Wu, F., Li, J.: A unified MRC framework for named entity recognition. In: Proceedings of ACL (2020)

    Google Scholar 

  14. Li, X., et al.: Entity-relation extraction as multi-turn question answering. arXiv preprint arXiv:1905.05529 (2019)

  15. Liang, J., Xiao, Y., Wang, H., Zhang, Y., Wang, W.: Probase+: inferring missing links in conceptual taxonomies. IEEE Trans. Knowl. Data Eng. 29(6), 1281–1295 (2017)

    Article  Google Scholar 

  16. Liang, J., Xiao, Y., Wang, H., Zhang, Y., Wang, W.: Probase+: inferring missing links in conceptual taxonomies. IEEE TKDE 29(6), 1281–1295 (2017)

    Google Scholar 

  17. Liang, J., Zhang, Y., Xiao, Y., Wang, H., Wang, W., Zhu, P.: On the transitivity of hypernym-hyponym relations in data-driven lexical taxonomies. In: Proceedings of AAAI, vol. 31 (2017)

    Google Scholar 

  18. Liao, J., Sun, F., Gu, J.: Combining concept graph with improved neural networks for Chinese short text classification. In: Wang, X., Lisi, F.A., Xiao, G., Botoeva, E. (eds.) JIST 2019. CCIS, vol. 1157, pp. 205–212. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-3412-6_20

    Chapter  Google Scholar 

  19. Liu, S., Zhang, X., Zhang, S., Wang, H., Zhang, W.: Neural machine reading comprehension: methods and trends. Appl. Sci. 9(18), 3698 (2019)

    Article  Google Scholar 

  20. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  21. Nguyen, A.D., Nguyen, K.H., Ngo, V.V.: Neural sequence labeling for Vietnamese POS tagging and NER. In: Proceedings of IEEE-RIVF, pp. 1–5. IEEE (2019)

    Google Scholar 

  22. Nie, Y., Tian, Y., Song, Y., Ao, X., Wan, X.: Improving named entity recognition with attentive ensemble of syntactic information. arXiv preprint arXiv:2010.15466 (2020)

  23. Petrucci, G., Rospocher, M., Ghidini, C.: Expressive ontology learning as neural machine translation. JWS 52, 66–82 (2018)

    Article  Google Scholar 

  24. Ponzetto, S.P., Strube, M.: WikiTaxonomy: a large scale knowledge resource. In: Proceedings of ECAI, vol. 178, pp. 751–752. Citeseer (2008)

    Google Scholar 

  25. Poria, S., Hussain, A., Cambria, E.: EmoSenticSpace: dense concept-based affective features with common-sense knowledge. In: Multimodal Sentiment Analysis. SC, vol. 8, pp. 85–116. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-95020-4_5

    Chapter  Google Scholar 

  26. Preum, S.M., Shu, S., Alemzadeh, H., Stankovic, J.A.: EMSContExt: EMS protocol-driven concept extraction for cognitive assistance in emergency response. In: Proceedings of AAAI, pp. 13350–13355 (2020)

    Google Scholar 

  27. Qiu, J., Chai, Y., Tian, Z., Du, X., Guizani, M.: Automatic concept extraction based on semantic graphs from big data in smart city. IEEE Trans. Comput. Soc. Syst. 7(1), 225–233 (2019)

    Article  Google Scholar 

  28. Roller, S., Kiela, D., Nickel, M.: Hearst patterns revisited: automatic hypernym detection from large text corpora. In: Proceedings of ACL (2018)

    Google Scholar 

  29. Ruan, D.R., He, X.Y., Li, D.Y., Gao, K.: Modeling and extracting hyponymy relationships on Chinese electric power field content. In: 2016 8th International Conference on Modelling, Identification and Control (ICMIC), pp. 439–443. IEEE (2016)

    Google Scholar 

  30. Sammut, C., Webb, G.I. (eds.): Encyclopedia of Machine Learning and Data Mining. Springer, Boston (2017). https://doi.org/10.1007/978-1-4899-7687-1

    Book  MATH  Google Scholar 

  31. Seo, M., Kembhavi, A., Farhadi, A., Hajishirzi, H.: Bidirectional attention flow for machine comprehension. arXiv preprint arXiv:1611.01603 (2016)

  32. Sharma, R., Gopalani, D., Meena, Y.: Concept-based approach for research paper recommendation. In: Shankar, B.U., Ghosh, K., Mandal, D.P., Ray, S.S., Zhang, D., Pal, S.K. (eds.) PReMI 2017. LNCS, vol. 10597, pp. 687–692. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69900-4_87

    Chapter  Google Scholar 

  33. Shen, Y., Huang, P.S., Gao, J., Chen, W.: ReasoNet: learning to stop reading in machine comprehension. In: Proceedings of ACM SIGKDD, pp. 1047–1055 (2017)

    Google Scholar 

  34. Shvets, A., Wanner, L.: Concept extraction using pointer-generator networks. arXiv preprint arXiv:2008.11295 (2020)

  35. Song, Y., Tian, S., Yu, L.: A method for identifying local drug names in Xinjiang based on BERT-BiLSTM-CRF. Autom. Control Comput. Sci. 54(3), 179–190 (2020). https://doi.org/10.3103/S0146411620030098

    Article  Google Scholar 

  36. Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: Proceedings of WWW (2007)

    Google Scholar 

  37. Vinyals, O., Fortunato, M., Jaitly, N.: Pointer networks. In: Proceeedings of NIPS, pp. 2692–2700 (2015)

    Google Scholar 

  38. Wei, Z., Su, J., Wang, Y., Tian, Y., Chang, Y.: A novel hierarchical binary tagging framework for joint extraction of entities and relations. arXiv preprint arXiv:1909.03227 (2019)

  39. Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: Proceedings of ACM SIGMOD, pp. 481–492 (2012)

    Google Scholar 

  40. Xu, B., et al.: METIC: multi-instance entity typing from corpus. In: Proceedings of CIKM, pp. 903–912 (2018)

    Google Scholar 

  41. Xu, B., et al.: CN-DBpedia: a never-ending Chinese knowledge extraction system. In: Benferhat, S., Tabia, K., Ali, M. (eds.) IEA/AIE 2017. LNCS (LNAI), vol. 10351, pp. 428–438. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-60045-1_44

    Chapter  Google Scholar 

  42. Xu, B., Zhang, Y., Liang, J., Xiao, Y., Hwang, S., Wang, W.: Cross-lingual type inference. In: Navathe, S.B., Wu, W., Shekhar, S., Du, X., Wang, X.S., Xiong, H. (eds.) DASFAA 2016. LNCS, vol. 9642, pp. 447–462. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-32025-0_28

    Chapter  Google Scholar 

  43. Yang, Y., Shen, X., Wang, Y.: BERT-BiLSTM-CRF for Chinese sensitive vocabulary recognition. In: Li, K., Li, W., Wang, H., Liu, Y. (eds.) ISICA 2019. CCIS, vol. 1205, pp. 257–268. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-5577-0_19

    Chapter  Google Scholar 

  44. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R.R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. In: Proceedings of NIPS, pp. 5753–5763 (2019)

    Google Scholar 

  45. Yao, J., Cui, B., Cong, G., Huang, Y.: Evolutionary taxonomy construction from dynamic tag space. WWW 15(5–6), 581–602 (2012). https://doi.org/10.1007/s11280-011-0150-4

    Article  Google Scholar 

  46. Yilahun, H., Abdurahman, K., Imam, S., Hamdulla, A.: Automatic extraction of Uyghur domain concepts based on multi-feature for ontology extension. IET Netw. 9(4), 200–205 (2020)

    Article  Google Scholar 

  47. Zhao, G., Zhang, X.: Domain-specific ontology concept extraction and hierarchy extension. In: Proceedings of NLPIR, pp. 60–64 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Deqing Yang or Yanghua Xiao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yuan, S. et al. (2021). Large-Scale Multi-granular Concept Extraction Based on Machine Reading Comprehension. In: Hotho, A., et al. The Semantic Web – ISWC 2021. ISWC 2021. Lecture Notes in Computer Science(), vol 12922. Springer, Cham. https://doi.org/10.1007/978-3-030-88361-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88361-4_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88360-7

  • Online ISBN: 978-3-030-88361-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics