Skip to main content

Contrastive Graph Learning with Graph Convolutional Networks

  • Conference paper
  • First Online:
Document Analysis Systems (DAS 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13237))

Included in the following conference series:

Abstract

We introduce a new approach for graph representation learning, which will improve the performance of graph-based methods for the problem of key information extraction (kie) from document images. The existing methods either use a fixed graph representation or learn a graph representation for the given problem. However, the methods which learn graph representation do not consider the nodes label information. It may result in sub-optimal graph representation and also can lead to slow convergence. In this paper, we propose a novel contrastive learning framework for learning the graph representation by leveraging the information of node labels. We present a contrastive graph learning convolutional network (cglcn), where the contrastive graph learning framework is used along with the graph convolutional network (gcn) in a unified network architecture. In addition to this, we also create a labeled data set of receipt images (Receipt dataset), where we do the annotations at the word level rather than at the sentence/group of words level. The Receipt dataset is well suited for evaluating the kie models. The results on this dataset show the superiority of the proposed contrastive graph learning framework over other baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Huang, Z., et al.: ICDAR2019 competition on scanned receipt OCR and information extraction. In: ICDAR (2019)

    Google Scholar 

  2. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: NAACL (2016)

    Google Scholar 

  3. Esser, D., Schuster, D., Muthmann, K., Berger, M., Schill, A.: Automatic indexing of scanned documents: a layout-based approach. In: DRR (2012)

    Google Scholar 

  4. Cesarini, F., Francesconi, E., Gori, M., Soda, G.: Analysis and understanding of multi-class invoices. DAS 6, 102–114 (2003)

    Google Scholar 

  5. Simon, A., Pret, J.-C., Johnson, A.P.: A fast algorithm for bottom-up document layout analysis. In: PAMI (1997)

    Google Scholar 

  6. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)

    Google Scholar 

  7. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: ICLR (2017)

    Google Scholar 

  8. Xu, Y., et al.: LayoutLMv2: multi-modal pre-training for visually-rich document understanding. arXiv (2020)

    Google Scholar 

  9. Garncarek, Ł, et al.: LAMBERT: layout-aware language modeling for information extraction. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 532–547. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_34

    Chapter  Google Scholar 

  10. Lin, W., et al.: ViBERTgrid: a jointly trained multi-modal 2D document representation for key information extraction from documents. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 548–563. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_35

    Chapter  Google Scholar 

  11. Liu, X., Gao, F., Zhang, Q., Zhao, H.: Graph convolution for multimodal information extraction from visually rich documents. In: NAACL (2019)

    Google Scholar 

  12. Yu, W., Lu, N., Qi, X., Gong, P., Xiao, R.: PICK: processing key information extraction from documents using improved graph learning-convolutional networks. In: ICPR (2020)

    Google Scholar 

  13. Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data. arXiv (2015)

    Google Scholar 

  14. Zhu, F., Huang, J., Li, R., Wang, S.: Adaptive graph convolutional neural networks. In: AAAI (2018)

    Google Scholar 

  15. Jiang, B., Zhang, Z., Lin, D., Tang, J., Luo, B.: Semi-supervised learning with graph learning-convolutional networks. In: CVPR (2019)

    Google Scholar 

  16. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer Series in Statistics, Springer, New York (2001)

    Book  Google Scholar 

  17. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: CVPR (2006)

    Google Scholar 

  18. Khosla, P., et al.: Supervised contrastive learning. In: NIPS (2020)

    Google Scholar 

  19. Ge, Y., Zhu, F., Chen, D., Zhao, R., Li, H.: Self-paced contrastive learning with hybrid memory for domain adaptive object Re-ID. In: NIPS (2020)

    Google Scholar 

  20. Zhang, Y., Zhang, X., Qiu, R.C., Li, J., Xu, H., Tian, Q.: Semi-supervised contrastive learning with similarity co-calibration. CoRR abs/2105.07387 (2021)

    Google Scholar 

  21. You, Y., Chen, T., Sui, Y., Chen, T., Wang, Z., Shen, Y.: Graph contrastive learning with augmentations. In: NIPS (2020)

    Google Scholar 

  22. You, Y., Chen, T., Shen, Y., Wang, Z.: Graph contrastive learning automated. In: ICML (2021)

    Google Scholar 

  23. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. arXiv (2020)

    Google Scholar 

  24. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)

    Google Scholar 

  25. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML (2020)

    Google Scholar 

  26. Vaswani, A., et al.: Attention is all you need. In: NIPS (2017)

    Google Scholar 

  27. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM networks. In: IJCNN (2005)

    Google Scholar 

  28. Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: ICML (2001)

    Google Scholar 

  29. Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: pre-training of text and layout for document image understanding. In: SIGKDD (2020)

    Google Scholar 

  30. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL (2019)

    Google Scholar 

  31. Ben-younes, H., Cadene, R., Thome, N., Cord, M.: BLOCK: bilinear superdiagonal fusion for visual question answering and visual relationship detection. In: AAAI (2019)

    Google Scholar 

  32. De Lathauwer, L.: Decompositions of a higher-order tensor in block terms part II: definitions and uniqueness. In: SIMAX (2008)

    Google Scholar 

  33. Zhang, Z., Liu, Q.: Road extraction by deep residual U-Net. In: GRSL (2017)

    Google Scholar 

  34. Diakogiannis, F.I., Waldner, F., Caccetta, P., Wu, C.: ResUNet-a: a deep learning framework for semantic segmentation of remotely sensed data. In: ISPRS (2020)

    Google Scholar 

  35. Hendrycks, D., Gimpel, K.: Gaussian error linear units (GELUs). arXiv (2016)

    Google Scholar 

  36. Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: AAAI (2020)

    Google Scholar 

  37. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. In: PAMI (2017)

    Google Scholar 

  38. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv (2014)

    Google Scholar 

  39. Park, S., et al.: CORD: a consolidated receipt dataset for post-OCR parsing. In: Document Intelligence Workshop at NeurIPS (2019)

    Google Scholar 

  40. Powalski, R., Borchmann, Ł., Jurkiewicz, D., Dwojak, T., Pietruszka, M., Pałka, G.: Going full-tilt boogie on document understanding with textimage-layout transformer. arXiv (2021)

    Google Scholar 

  41. Hwang, W., Yim, J., Park, S., Yang, S., Seo, M.: Spatial dependency parsing for semi-structured document information extraction. In: ACL-IJCNLP (2021)

    Google Scholar 

  42. Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional LSTM-CNNsCRF. In: ACL (2016)

    Google Scholar 

  43. Appalaraju, S., Jasani, B., Kota, B.U., Xie, Y., Manmatha, R.: DocFormer: end-to-end transformer for document understanding. In: ICCV (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. Nagendar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nagendar, G., Sitaram, R. (2022). Contrastive Graph Learning with Graph Convolutional Networks. In: Uchida, S., Barney, E., Eglin, V. (eds) Document Analysis Systems. DAS 2022. Lecture Notes in Computer Science, vol 13237. Springer, Cham. https://doi.org/10.1007/978-3-031-06555-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06555-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06554-5

  • Online ISBN: 978-3-031-06555-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics