Abstract
We introduce a new approach for graph representation learning, which will improve the performance of graph-based methods for the problem of key information extraction (kie) from document images. The existing methods either use a fixed graph representation or learn a graph representation for the given problem. However, the methods which learn graph representation do not consider the nodes label information. It may result in sub-optimal graph representation and also can lead to slow convergence. In this paper, we propose a novel contrastive learning framework for learning the graph representation by leveraging the information of node labels. We present a contrastive graph learning convolutional network (cglcn), where the contrastive graph learning framework is used along with the graph convolutional network (gcn) in a unified network architecture. In addition to this, we also create a labeled data set of receipt images (Receipt dataset), where we do the annotations at the word level rather than at the sentence/group of words level. The Receipt dataset is well suited for evaluating the kie models. The results on this dataset show the superiority of the proposed contrastive graph learning framework over other baseline methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Huang, Z., et al.: ICDAR2019 competition on scanned receipt OCR and information extraction. In: ICDAR (2019)
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: NAACL (2016)
Esser, D., Schuster, D., Muthmann, K., Berger, M., Schill, A.: Automatic indexing of scanned documents: a layout-based approach. In: DRR (2012)
Cesarini, F., Francesconi, E., Gori, M., Soda, G.: Analysis and understanding of multi-class invoices. DAS 6, 102–114 (2003)
Simon, A., Pret, J.-C., Johnson, A.P.: A fast algorithm for bottom-up document layout analysis. In: PAMI (1997)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: ICLR (2017)
Xu, Y., et al.: LayoutLMv2: multi-modal pre-training for visually-rich document understanding. arXiv (2020)
Garncarek, Ł, et al.: LAMBERT: layout-aware language modeling for information extraction. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 532–547. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_34
Lin, W., et al.: ViBERTgrid: a jointly trained multi-modal 2D document representation for key information extraction from documents. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 548–563. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_35
Liu, X., Gao, F., Zhang, Q., Zhao, H.: Graph convolution for multimodal information extraction from visually rich documents. In: NAACL (2019)
Yu, W., Lu, N., Qi, X., Gong, P., Xiao, R.: PICK: processing key information extraction from documents using improved graph learning-convolutional networks. In: ICPR (2020)
Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data. arXiv (2015)
Zhu, F., Huang, J., Li, R., Wang, S.: Adaptive graph convolutional neural networks. In: AAAI (2018)
Jiang, B., Zhang, Z., Lin, D., Tang, J., Luo, B.: Semi-supervised learning with graph learning-convolutional networks. In: CVPR (2019)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics, Springer, New York (2001)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: CVPR (2006)
Khosla, P., et al.: Supervised contrastive learning. In: NIPS (2020)
Ge, Y., Zhu, F., Chen, D., Zhao, R., Li, H.: Self-paced contrastive learning with hybrid memory for domain adaptive object Re-ID. In: NIPS (2020)
Zhang, Y., Zhang, X., Qiu, R.C., Li, J., Xu, H., Tian, Q.: Semi-supervised contrastive learning with similarity co-calibration. CoRR abs/2105.07387 (2021)
You, Y., Chen, T., Sui, Y., Chen, T., Wang, Z., Shen, Y.: Graph contrastive learning with augmentations. In: NIPS (2020)
You, Y., Chen, T., Shen, Y., Wang, Z.: Graph contrastive learning automated. In: ICML (2021)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. arXiv (2020)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML (2020)
Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM networks. In: IJCNN (2005)
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML (2001)
Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: pre-training of text and layout for document image understanding. In: SIGKDD (2020)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)
Ben-younes, H., Cadene, R., Thome, N., Cord, M.: BLOCK: bilinear superdiagonal fusion for visual question answering and visual relationship detection. In: AAAI (2019)
De Lathauwer, L.: Decompositions of a higher-order tensor in block terms part II: definitions and uniqueness. In: SIMAX (2008)
Zhang, Z., Liu, Q.: Road extraction by deep residual U-Net. In: GRSL (2017)
Diakogiannis, F.I., Waldner, F., Caccetta, P., Wu, C.: ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data. In: ISPRS (2020)
Hendrycks, D., Gimpel, K.: Gaussian error linear units (GELUs). arXiv (2016)
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: AAAI (2020)
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. In: PAMI (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv (2014)
Park, S., et al.: CORD: a consolidated receipt dataset for post-OCR parsing. In: Document Intelligence Workshop at NeurIPS (2019)
Powalski, R., Borchmann, Ł., Jurkiewicz, D., Dwojak, T., Pietruszka, M., Pałka, G.: Going full-tilt boogie on document understanding with textimage-layout transformer. arXiv (2021)
Hwang, W., Yim, J., Park, S., Yang, S., Seo, M.: Spatial dependency parsing for semi-structured document information extraction. In: ACL-IJCNLP (2021)
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNsCRF. In: ACL (2016)
Appalaraju, S., Jasani, B., Kota, B.U., Xie, Y., Manmatha, R.: DocFormer: end-to-end transformer for document understanding. In: ICCV (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Nagendar, G., Sitaram, R. (2022). Contrastive Graph Learning with Graph Convolutional Networks. In: Uchida, S., Barney, E., Eglin, V. (eds) Document Analysis Systems. DAS 2022. Lecture Notes in Computer Science, vol 13237. Springer, Cham. https://doi.org/10.1007/978-3-031-06555-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-06555-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06554-5
Online ISBN: 978-3-031-06555-2
eBook Packages: Computer ScienceComputer Science (R0)