Contrastive Graph Learning with Graph Convolutional Networks

Nagendar, G.; Sitaram, Ramachandrula

doi:10.1007/978-3-031-06555-2_7

G. Nagendar¹⁰ &
Ramachandrula Sitaram¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13237))

Included in the following conference series:

International Workshop on Document Analysis Systems

1866 Accesses
1 Citations

Abstract

We introduce a new approach for graph representation learning, which will improve the performance of graph-based methods for the problem of key information extraction (kie) from document images. The existing methods either use a fixed graph representation or learn a graph representation for the given problem. However, the methods which learn graph representation do not consider the nodes label information. It may result in sub-optimal graph representation and also can lead to slow convergence. In this paper, we propose a novel contrastive learning framework for learning the graph representation by leveraging the information of node labels. We present a contrastive graph learning convolutional network (cglcn), where the contrastive graph learning framework is used along with the graph convolutional network (gcn) in a unified network architecture. In addition to this, we also create a labeled data set of receipt images (Receipt dataset), where we do the annotations at the word level rather than at the sentence/group of words level. The Receipt dataset is well suited for evaluating the kie models. The results on this dataset show the superiority of the proposed contrastive graph learning framework over other baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Huang, Z., et al.: ICDAR2019 competition on scanned receipt OCR and information extraction. In: ICDAR (2019)
Google Scholar
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: NAACL (2016)
Google Scholar
Esser, D., Schuster, D., Muthmann, K., Berger, M., Schill, A.: Automatic indexing of scanned documents: a layout-based approach. In: DRR (2012)
Google Scholar
Cesarini, F., Francesconi, E., Gori, M., Soda, G.: Analysis and understanding of multi-class invoices. DAS 6, 102–114 (2003)
Google Scholar
Simon, A., Pret, J.-C., Johnson, A.P.: A fast algorithm for bottom-up document layout analysis. In: PAMI (1997)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)
Google Scholar
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: ICLR (2017)
Google Scholar
Xu, Y., et al.: LayoutLMv2: multi-modal pre-training for visually-rich document understanding. arXiv (2020)
Google Scholar
Garncarek, Ł, et al.: LAMBERT: layout-aware language modeling for information extraction. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 532–547. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_34
Chapter Google Scholar
Lin, W., et al.: ViBERTgrid: a jointly trained multi-modal 2D document representation for key information extraction from documents. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 548–563. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_35
Chapter Google Scholar
Liu, X., Gao, F., Zhang, Q., Zhao, H.: Graph convolution for multimodal information extraction from visually rich documents. In: NAACL (2019)
Google Scholar
Yu, W., Lu, N., Qi, X., Gong, P., Xiao, R.: PICK: processing key information extraction from documents using improved graph learning-convolutional networks. In: ICPR (2020)
Google Scholar
Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data. arXiv (2015)
Google Scholar
Zhu, F., Huang, J., Li, R., Wang, S.: Adaptive graph convolutional neural networks. In: AAAI (2018)
Google Scholar
Jiang, B., Zhang, Z., Lin, D., Tang, J., Luo, B.: Semi-supervised learning with graph learning-convolutional networks. In: CVPR (2019)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics, Springer, New York (2001)
Book Google Scholar
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: CVPR (2006)
Google Scholar
Khosla, P., et al.: Supervised contrastive learning. In: NIPS (2020)
Google Scholar
Ge, Y., Zhu, F., Chen, D., Zhao, R., Li, H.: Self-paced contrastive learning with hybrid memory for domain adaptive object Re-ID. In: NIPS (2020)
Google Scholar
Zhang, Y., Zhang, X., Qiu, R.C., Li, J., Xu, H., Tian, Q.: Semi-supervised contrastive learning with similarity co-calibration. CoRR abs/2105.07387 (2021)
Google Scholar
You, Y., Chen, T., Sui, Y., Chen, T., Wang, Z., Shen, Y.: Graph contrastive learning with augmentations. In: NIPS (2020)
Google Scholar
You, Y., Chen, T., Shen, Y., Wang, Z.: Graph contrastive learning automated. In: ICML (2021)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. arXiv (2020)
Google Scholar
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML (2020)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)
Google Scholar
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM networks. In: IJCNN (2005)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML (2001)
Google Scholar
Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: pre-training of text and layout for document image understanding. In: SIGKDD (2020)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)
Google Scholar
Ben-younes, H., Cadene, R., Thome, N., Cord, M.: BLOCK: bilinear superdiagonal fusion for visual question answering and visual relationship detection. In: AAAI (2019)
Google Scholar
De Lathauwer, L.: Decompositions of a higher-order tensor in block terms part II: definitions and uniqueness. In: SIMAX (2008)
Google Scholar
Zhang, Z., Liu, Q.: Road extraction by deep residual U-Net. In: GRSL (2017)
Google Scholar
Diakogiannis, F.I., Waldner, F., Caccetta, P., Wu, C.: ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data. In: ISPRS (2020)
Google Scholar
Hendrycks, D., Gimpel, K.: Gaussian error linear units (GELUs). arXiv (2016)
Google Scholar
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: AAAI (2020)
Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. In: PAMI (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv (2014)
Google Scholar
Park, S., et al.: CORD: a consolidated receipt dataset for post-OCR parsing. In: Document Intelligence Workshop at NeurIPS (2019)
Google Scholar
Powalski, R., Borchmann, Ł., Jurkiewicz, D., Dwojak, T., Pietruszka, M., Pałka, G.: Going full-tilt boogie on document understanding with textimage-layout transformer. arXiv (2021)
Google Scholar
Hwang, W., Yim, J., Park, S., Yang, S., Seo, M.: Spatial dependency parsing for semi-structured document information extraction. In: ACL-IJCNLP (2021)
Google Scholar
Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNsCRF. In: ACL (2016)
Google Scholar
Appalaraju, S., Jasani, B., Kota, B.U., Xie, Y., Manmatha, R.: DocFormer: end-to-end transformer for document understanding. In: ICCV (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

247.ai, Bangalore, India
G. Nagendar & Ramachandrula Sitaram

Authors

G. Nagendar
View author publications
You can also search for this author in PubMed Google Scholar
Ramachandrula Sitaram
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to G. Nagendar .

Editor information

Editors and Affiliations

Kyushu University, Fukuoka, Japan
Seiichi Uchida
Boise State University, BOISE, ID, USA
Elisa Barney
LIRIS UMR CNRS, Villeurbanne, France
Véronique Eglin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nagendar, G., Sitaram, R. (2022). Contrastive Graph Learning with Graph Convolutional Networks. In: Uchida, S., Barney, E., Eglin, V. (eds) Document Analysis Systems. DAS 2022. Lecture Notes in Computer Science, vol 13237. Springer, Cham. https://doi.org/10.1007/978-3-031-06555-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-06555-2_7
Published: 18 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06554-5
Online ISBN: 978-3-031-06555-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Contrastive Graph Learning with Graph Convolutional Networks