Abstract
Most administrative documents take a semi-structured form (invoices, payslips, etc.). Extracting information from this type of document is still challenging because of the variability of its structure brought about by the change of layout style of the different administrations. In this work, we try to face this type of variation by using a multi-layer Graph Attention Network (GAT). We propose a general structure of a semi-structured document. Based on this latter, we adopt a star sub-graph to exploit the surrounding context of words, allowing neighboring words to help locate the searched words and rank them. The GAT makes it possible to exploit this type of neighborhood and to highlight important neighboring words likely to be better identified. Each graph node contains at the same time textual and visual features. We experiment the multi-layer GAT on three different datasets: invoices and payslips (generated artificially), and receipts (issued from SROIE ICDAR competition). For the later dataset, we get an important F1 score of 0.892.
Supported by BPI DeepTech.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Brown, J.: System and method for identification and extraction of data, US Patent 9,589,183, 7 March 2017
Dengel, A.R., Klein, B.: smartFIX: a requirements-driven system for document analysis and understanding. In: Lopresti, D., Hu, J., Kashi, R. (eds.) DAS 2002. LNCS, vol. 2423, pp. 433–444. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45869-7_47
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Gal, R., Morag, N., Shilkrot, R.: Visual-linguistic methods for receipt field recognition. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11362, pp. 542–557. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20890-5_35
Grattarola, D., Alippi, C.: Graph neural networks in tensorflow and keras with spektral. arXiv preprint arXiv:2006.12138 (2020)
Hammami, M., Héroux, P., Adam, S., d’Andecy, V.P.: One-shot field spotting on colored forms using subgraph isomorphism. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 586–590. IEEE (2015)
Heinzerling, B., Strube, M.: Bpemb: tokenization-free pre-trained subword embeddings in 275 languages. arXiv preprint arXiv:1710.02187 (2017)
Hua, Y., Huang, Z., Guo, J., Qiu, W.: Attention-based graph neural network with global context awareness for document understanding. In: Sun, M., Li, S., Zhang, Y., Liu, Y., He, S., Rao, G. (eds.) CCL 2020. LNCS (LNAI), vol. 12522, pp. 45–56. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-63031-7_4
Huang, Z., et al.: ICDAR 2019 competition on scanned receipt OCR and information extraction. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1516–1520. IEEE (2019)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991 (2015)
Kavas, I.: Analytic systems, methods, and computer-readable media for structured, semi-structured, and unstructured documents, US Patent 9,384,264, 5 July 2016
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360 (2016)
Le, A.D., Pham, D.V., Nguyen, T.A.: Deep learning approach for receipt recognition. In: Dang, T.K., Küng, J., Takizawa, M., Bui, S.H. (eds.) FDSE 2019. LNCS, vol. 11814, pp. 705–712. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35653-8_50
Liu, X., Gao, F., Zhang, Q., Zhao, H.: Graph convolution for multimodal information extraction from visually rich documents. arXiv preprint arXiv:1903.11279 (2019)
Lohani, D., Belaïd, A., Belaïd, Y.: An invoice reading system using a graph convolutional network. In: Carneiro, G., You, S. (eds.) ACCV 2018. LNCS, vol. 11367, pp. 144–158. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-21074-8_12
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv preprint arXiv:1603.01354 (2016)
Majumder, B.P., Potti, N., Tata, S., Wendt, J.B., Zhao, Q., Najork, M.: Representation learning for information extraction from form-like documents. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 6495–6504 (2020)
Rusinol, M., Benkhelfallah, T., Poulain dAndecy, V.: Field extraction from administrative documents by incremental structural templates. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1100–1104. IEEE (2013)
Schuster, D., et al.: Intellix-end-user trained information extraction for document archiving. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 101–105. IEEE (2013)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909 (2015)
Shen, Z., Tijerino, Y.: Ontology-based automatic receipt accounting system. In: 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, vol. 3, pp. 236–239. IEEE (2012)
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: International Conference on Learning Representations (2018)
Yu, W., Lu, N., Qi, X., Gong, P., Xiao, R.: Pick: processing key information extraction from documents using improved graph learning-convolutional networks. arXiv preprint arXiv:2004.07464 (2020)
Zhang, K., Li, J.Z., Hong, M.C., Yan, X.D., Song, Q.: A semantics enabled intelligent semi-structured document processor. In: Yuan, Y., Wu, X., Lu, Y. (eds.) ISCTCS 2013. CCIS, vol. 426, pp. 328–344. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43908-1_41
Zhang, P., et al.: TRIE: end-to-end text reading and information extraction for document understanding. arXiv preprint arXiv:2005.13118 (2020)
Acknowledgements
This work was carried out within the framework of the BPI DeepTech project, in partnership between the University of Lorraine (Ref. UL: GECO/2020/00331), the CNRS, the INRIA Lorraine and the company FAIR&SMART. The authors would like to thank all the partners for their fruitful collaboration.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Belhadj, D., Belaïd, Y., Belaïd, A. (2021). Consideration of the Word’s Neighborhood in GATs for Information Extraction in Semi-structured Documents. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12822. Springer, Cham. https://doi.org/10.1007/978-3-030-86331-9_55
Download citation
DOI: https://doi.org/10.1007/978-3-030-86331-9_55
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86330-2
Online ISBN: 978-3-030-86331-9
eBook Packages: Computer ScienceComputer Science (R0)