Abstract
Effective management of massive electronic documents is one of the hot topics for social services. There exist several knowledge bases of documents published for researchers to explore downstream applications such as PubLayNet and DocBank. Nevertheless, these datasets are mainly designed for document layout analysis and do not consider the linkages among documents. To improve this issue, in this paper, we present an official document knowledge graph, namely ODKG, which aims to collect the offical documents for effective management. We design a lightweight ontology of official documents. It can bring a well-defined schema of collected documents so that they could share more linkages with each other. We present the algorithms of element extraction, document archiving, and knowledge alignment during the process of ODKG construction, and further evaluate the corresponding algorithms based on our constructed datasets. Experimental results show that several algorithms can be competent to above tasks to some extent. Finally, we list three use cases of ODKG that are helpful for managers to improve the efficiency of their document management.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Zhao, H., Wang, F., Wang, X., Zhang, W., Yang, J.: Research on construction and application of a knowledge discovery system based on intelligent processing of large-scale governmental documents. J. China Soc. Sci. Tech. Inf. 37(8), 805–812 (2018)
Zhong, X., Tang, J., Yepes, A.J.: Publaynet: largest dataset ever for document layout analysis. In: Proceedings of the 2019 International Conference on Document Analysis and Recognition, pages 1015–1022. IEEE (2019)
Li, M.: DocBank: a benchmark dataset for document layout analysis. In Proceedings of the 28th International Conference on Computational Linguistics, pp 949–960. International Committee on Computational Linguistics (2020)
Cui, C., Shi, Y., Yuan, B., Li, Y., Li, Y., Zhou, C.: Research on relation extraction method for government documents. Comput. Technol. Dev. 31(12), 26–32 (2021)
Ruilin, X., Geng, B., Liu, S.: Research on structural knowledge extraction and organization formulti-modal governmental documents. Syst. Eng. Electron. 44(7), 2241–2250 (2022)
Zhang, Yu., Jun, W.: Research on the construction of science and technology policy knowledge graph. Digital Libr. Forum 8, 31–38 (2021)
Xu, J., et al.: Short text clustering via convolutional neural networks. In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pp. 62–69 (2015)
Li, J., Sun, A., Han, J., Li, C.: A survey on deep learning for named entity recognition. IEEE Trans. Knowl. Data Eng. 34(1), 50–70 (2022)
Shang, J., Liu, L., Ren, X., Gu, X., Ren,T., Han, J.: Learning named entity tagger using domain-specific dictionary. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2054–2064. Association for Computational Linguistics (2018)
Shang, J., Liu, J., Jiang, M., Ren, X., Voss, C.R., Han, J.: Automated phrase mining from massive text corpora. IEEE Trans. Knowl. Data Eng. 30(10), 1825–1837 (2018)
Zhang, N.: DeepKE: a deep learning based knowledge extraction toolkit for knowledge base population. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 98–108. Association for Computational Linguistics (2022)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4171–4186. Association for Computational Linguistics (2019)
Niu, X., Rong, S., Wang, H., Yu, Y.: An effective rule miner for instance matching in a web of data. In: Proceedings of the 21st ACM international conference on Information and knowledge management, pp. 1085–1094. ACM (2012)
Sun, Z., et al.: A benchmarking study of embedding-based entity alignment for knowledge graphs. Proc. VLDB Endow. 13(11), 2326–2340 (2020)
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online Learning of Social Representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710. ACM (2014)
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: LINE: large-scale information network embedding. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1067–1077. ACM (2015)
Grover, A., Leskovec, J.: node2vec: scalable Feature Learning for Networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864. ACM (2016)
Li, W., Zhang, B., Xu, L., Wang, M., Luo, A., Niu, Y.: Combining knowledge graph embedding and network embedding for detecting similar mobile applications. In: Zhu, X., Zhang, M., Hong, Yu., He, R. (eds.) NLPCC 2020. LNCS (LNAI), vol. 12430, pp. 256–269. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-60450-9_21
Acknowledgements
This work was supported by the Natural Science Foundation of China (62006125), the Foundation of Jiangsu Provincial Double-Innovation Doctor Program (JSSCBS20210532), the NUPTSF (NY220171) and Key Research Project of Zhejiang Lab (2022NF0AC01).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Lu, B. et al. (2023). ODKG: An Official Document Knowledge Graph for the Effective Management. In: Wang, H., Han, X., Liu, M., Cheng, G., Liu, Y., Zhang, N. (eds) Knowledge Graph and Semantic Computing: Knowledge Graph Empowers Artificial General Intelligence. CCKS 2023. Communications in Computer and Information Science, vol 1923. Springer, Singapore. https://doi.org/10.1007/978-981-99-7224-1_17
Download citation
DOI: https://doi.org/10.1007/978-981-99-7224-1_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7223-4
Online ISBN: 978-981-99-7224-1
eBook Packages: Computer ScienceComputer Science (R0)