Skip to main content

ODKG: An Official Document Knowledge Graph for the Effective Management

  • Conference paper
  • First Online:
Knowledge Graph and Semantic Computing: Knowledge Graph Empowers Artificial General Intelligence (CCKS 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1923))

Included in the following conference series:

  • 598 Accesses

Abstract

Effective management of massive electronic documents is one of the hot topics for social services. There exist several knowledge bases of documents published for researchers to explore downstream applications such as PubLayNet and DocBank. Nevertheless, these datasets are mainly designed for document layout analysis and do not consider the linkages among documents. To improve this issue, in this paper, we present an official document knowledge graph, namely ODKG, which aims to collect the offical documents for effective management. We design a lightweight ontology of official documents. It can bring a well-defined schema of collected documents so that they could share more linkages with each other. We present the algorithms of element extraction, document archiving, and knowledge alignment during the process of ODKG construction, and further evaluate the corresponding algorithms based on our constructed datasets. Experimental results show that several algorithms can be competent to above tasks to some extent. Finally, we list three use cases of ODKG that are helpful for managers to improve the efficiency of their document management.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://protege.stanford.edu/.

  2. 2.

    https://www.miit.gov.cn/.

  3. 3.

    https://nnsa.mee.gov.cn/.

  4. 4.

    https://www.most.gov.cn/.

  5. 5.

    http://www.gov.cn/.

  6. 6.

    https://github.com/NLPScott/bert-Chinese-classification-task.

  7. 7.

    http://jena.apache.org/.

  8. 8.

    https://neo4j.com/.

  9. 9.

    https://www.drools.org/.

References

  1. Zhao, H., Wang, F., Wang, X., Zhang, W., Yang, J.: Research on construction and application of a knowledge discovery system based on intelligent processing of large-scale governmental documents. J. China Soc. Sci. Tech. Inf. 37(8), 805–812 (2018)

    Google Scholar 

  2. Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document layout analysis. In: Proceedings of the 2019 International Conference on Document Analysis and Recognition, pages 1015–1022. IEEE (2019)

    Google Scholar 

  3. Li, M.: DocBank: a benchmark dataset for document layout analysis. In Proceedings of the 28th International Conference on Computational Linguistics, pp 949–960. International Committee on Computational Linguistics (2020)

    Google Scholar 

  4. Cui, C., Shi, Y., Yuan, B., Li, Y., Li, Y., Zhou, C.: Research on relation extraction method for government documents. Comput. Technol. Dev. 31(12), 26–32 (2021)

    Google Scholar 

  5. Ruilin, X., Geng, B., Liu, S.: Research on structural knowledge extraction and organization formulti-modal governmental documents. Syst. Eng. Electron. 44(7), 2241–2250 (2022)

    Google Scholar 

  6. Zhang, Yu., Jun, W.: Research on the construction of science and technology policy knowledge graph. Digital Libr. Forum 8, 31–38 (2021)

    Google Scholar 

  7. Xu, J., et al.: Short text clustering via convolutional neural networks. In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pp. 62–69 (2015)

    Google Scholar 

  8. Li, J., Sun, A., Han, J., Li, C.: A survey on deep learning for named entity recognition. IEEE Trans. Knowl. Data Eng. 34(1), 50–70 (2022)

    Article  Google Scholar 

  9. Shang, J., Liu, L., Ren, X., Gu, X., Ren,T., Han, J.: Learning named entity tagger using domain-specific dictionary. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2054–2064. Association for Computational Linguistics (2018)

    Google Scholar 

  10. Shang, J., Liu, J., Jiang, M., Ren, X., Voss, C.R., Han, J.: Automated phrase mining from massive text corpora. IEEE Trans. Knowl. Data Eng. 30(10), 1825–1837 (2018)

    Google Scholar 

  11. Zhang, N.: DeepKE: a deep learning based knowledge extraction toolkit for knowledge base population. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 98–108. Association for Computational Linguistics (2022)

    Google Scholar 

  12. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186. Association for Computational Linguistics (2019)

    Google Scholar 

  13. Niu, X., Rong, S., Wang, H., Yu, Y.: An effective rule miner for instance matching in a web of data. In: Proceedings of the 21st ACM international conference on Information and knowledge management, pp. 1085–1094. ACM (2012)

    Google Scholar 

  14. Sun, Z., et al.: A benchmarking study of embedding-based entity alignment for knowledge graphs. Proc. VLDB Endow. 13(11), 2326–2340 (2020)

    Article  Google Scholar 

  15. Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online Learning of Social Representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM (2014)

    Google Scholar 

  16. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: LINE: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. ACM (2015)

    Google Scholar 

  17. Grover, A., Leskovec, J.: node2vec: scalable Feature Learning for Networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)

    Google Scholar 

  18. Li, W., Zhang, B., Xu, L., Wang, M., Luo, A., Niu, Y.: Combining knowledge graph embedding and network embedding for detecting similar mobile applications. In: Zhu, X., Zhang, M., Hong, Yu., He, R. (eds.) NLPCC 2020. LNCS (LNAI), vol. 12430, pp. 256–269. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60450-9_21

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of China (62006125), the Foundation of Jiangsu Provincial Double-Innovation Doctor Program (JSSCBS20210532), the NUPTSF (NY220171) and Key Research Project of Zhejiang Lab (2022NF0AC01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weizhuo Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lu, B. et al. (2023). ODKG: An Official Document Knowledge Graph for the Effective Management. In: Wang, H., Han, X., Liu, M., Cheng, G., Liu, Y., Zhang, N. (eds) Knowledge Graph and Semantic Computing: Knowledge Graph Empowers Artificial General Intelligence. CCKS 2023. Communications in Computer and Information Science, vol 1923. Springer, Singapore. https://doi.org/10.1007/978-981-99-7224-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-7224-1_17

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-7223-4

  • Online ISBN: 978-981-99-7224-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics