Sequence-aware multimodal page classification of Brazilian legal documents

Luz de Araujo, Pedro H.; de Almeida, Ana Paula G. S.; Ataides Braz, Fabricio; Correia da Silva, Nilton; de Barros Vidal, Flavio; de Campos, Teofilo E.

doi:10.1007/s10032-022-00406-7

Sequence-aware multimodal page classification of Brazilian legal documents

Original Paper
Published: 12 July 2022

Volume 26, pages 33–49, (2023)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

397 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The Brazilian Supreme Court receives tens of thousands of cases each semester. Court employees spend thousands of hours to execute the initial analysis and classification of those cases—which takes effort away from posterior, more complex stages of the case management workflow. In this paper, we explore multimodal classification of documents from Brazil’s Supreme Court. We train and evaluate our methods on a novel multimodal dataset of 6510 lawsuits (339,478 pages) with manual annotation assigning each page to one of six classes. Each lawsuit is an ordered sequence of pages, which are stored both as an image and as a corresponding text extracted through optical character recognition. We first train two unimodal classifiers: A ResNet pre-trained on ImageNet is fine-tuned on the images, and a convolutional network with filters of multiple kernel sizes is trained from scratch on document texts. We use them as extractors of visual and textual features, which are then combined through our proposed fusion module. Our fusion module can handle missing textual or visual input by using learned embeddings for missing data. Moreover, we experiment with bidirectional long short-term memory (biLSTM) networks and linear-chain conditional random fields to model the sequential nature of the pages. The multimodal approaches outperform both textual and visual classifiers, especially when leveraging the sequential nature of the pages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Automating petition classification in Brazil’s legal system: a two-step deep learning approach

Article 15 December 2023

Yuri D. R. Costa, Hugo Oliveira, … Thales Vieira

Multimodal Deep Networks for Text and Image-Based Document Classification

TransDocAnalyser: A Framework for Semi-structured Offline Handwritten Documents Analysis with an Application to Legal Domain

Data availability

Data used in this work is available at http://ailab.unb.br/victor/lrec2020/.

Code Availability

Code used in this work is available at https://github.com/peluz/victor-visual-text.

Notes

To the best of our knowledge.
http://ailab.unb.br/victor/lrec2020/.

References

Agam, G., Argamon, S., Frieder, O., Grossman, D., Lewis, D.: The Complex Document Image Processing (CDIP) test collection project (2006). http://ir.iit.edu/projects/CDIP.html
Audebert, N., Herold, C., Slimani, K., Vidal, C.: Multimodal deep networks for text and image-based document classification. CoRR abs/1907.06370 (2019). http://arxiv.org/abs/1907.06370
Bakkali, S., Ming, Z., Coustaty, M., Rusinol, M.: Visual and textual deep feature fusion for document image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2020)
Beltagy, I., Peters, M.E., Cohan, A.: Longformer: The long-document transformer. CoRR abs/2004.05150 (2020). https://arxiv.org/abs/2004.05150
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguistics 5, 135–146 (2017). https://doi.org/10.1162/tacl_a_00051.
Article Google Scholar
Braz, F.A., da Silva, N.C., Lima, J.A.S.: Leveraging effectiveness and efficiency in page stream deep segmentation. Eng. Appl. Artif. Intell. 105, 104394 (2021). https://doi.org/10.1016/j.engappai.2021.104394.
Article Google Scholar
Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., Niculae, V., Prettenhofer, P., Gramfort, A., Grobler, J., Layton, R., VanderPlas, J., Joly, A., Holt, B., Varoquaux, G.: API design for machine learning software: experiences from the scikit-learn project. In: ECML PKDD Workshop: Languages for Data Mining and Machine Learning, pp. 108–122 (2013)
Chen, N., Blostein, D.: A survey of document image classification: problem statement, classifier architecture and performance evaluation. Int. J. Document Anal. Recogn. (IJDAR) 10(1), 1–16 (2007). https://doi.org/10.1007/s10032-006-0020-2
Article Google Scholar
Chung, J., Gülçehre, Ç., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR abs/1412.3555 (2014). http://arxiv.org/abs/1412.3555
Conneau, A., Schwenk, H., Barrault, L., Lecun, Y.: Very deep convolutional networks for text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 1107–1116. Association for Computational Linguistics, Valencia, Spain (2017). http://www.aclweb.org/anthology/E17-1104
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inform. Sci. 41(6), 391–407 (1990)
Article Google Scholar
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). http://arxiv.org/abs/1810.04805
Dimmick, D., Garris, M., Wilson, C., Flanagan, P.: Nist special database 2 - structured forms database users’ guide (2017). https://doi.org/10.6028/NIST.NSRDS.2-2017
Engin, D., Emekligil, E., Oral, B., Arslan, S., Akpınar, M.: Multimodal deep neural networks for banking document classification. In: International Conference on Advances in Information Mining and Management, pp. 21–25 (2019)
Ford, G., Thoma, G.R.: Ground truth data for document image analysis. In: Symposium on document image understanding and technology (SDIUT), pp. 199–205 (2003)
Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 991–995. IEEE (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 328–339. Association for Computational Linguistics, Melbourne, Australia (2018). https://doi.org/10.18653/v1/P18-1031
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning - Volume 37, pp. 448–456. JMLR.org (2015). http://proceedings.mlr.press/v37/ioffe15.html
Jain, R., Wigington, C.: Multimodal document image classification. In: International Conference on Document Analysis and Recognition (ICDAR), pp. 71–77 (2019). https://doi.org/10.1109/ICDAR.2019.00021
Kingma, D.P., Ba, J.: Adam: A method for stochastic optmisation. In: International Conference on Learning Representations (ICLR) (2015). Preprint available at https://arxiv.org/abs/1412.6980
Kumar, J., Ye, P., Doermann, D.: Structural similarity for document image classification and retrieval. Pattern Recogn. Lett. 43, 119–126 (2014)
Article Google Scholar
Lafferty, J.D., Andrew, M., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML, pp. 282–289. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2001)
Lewis, D., Agam, G., Argamon, S., Frieder, O., Grossman, D., Heard, J.: Building a test collection for complex document information processing. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’06, p. 665-666. Association for Computing Machinery, New York, NY, USA (2006). https://doi.org/10.1145/1148170.1148307
Luz de Araujo, P.H., de Campos, T.E., Ataides Braz, F., Correia da Silva, N.: VICTOR: a dataset for Brazilian legal documents classification. In: Proceedings of The 12th Language Resources and Evaluation Conference (LREC), pp. 1449–1458. European Language Resources Association, Marseille, France (2020). https://www.aclweb.org/anthology/2020.lrec-1.181
Mota, C., Lima, A., Nascimento, A., Miranda, P., de Mello, R.: Classificação de páginas de petições iniciais utilizando redes neurais convolucionais multimodais. In: Anais do XVII Encontro Nacional de Inteligência Artificial e Computacional, pp. 318–329. SBC, Porto Alegre, RS, Brasil (2020). https://doi.org/10.5753/eniac.2020.12139
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020). http://jmlr.org/papers/v21/20-074.html
Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Natural language processing using very large corpora, pp. 157–176. Springer (1999). https://doi.org/10.1007/978-94-017-2390-9_10. Preprint available at http://arxiv.org/abs/cmp-lg/9505040
Rosenstein, M.T., Marx, Z., Kaelbling, L.P., Dietterich, T.G.: To transfer or not to transfer. In: In NIPS’05 Workshop, Inductive Transfer: 10 Years Later (2005)
Rusiñol, M., Frinken, V., Karatzas, D., Bagdanov, A.D., Lladós, J.: Multimodal page classification in administrative document image streams. Int. J. Document Anal. Recogn. (IJDAR) 17(4), 331–341 (2014). https://doi.org/10.1007/s10032-014-0225-8
Article Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Visi. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Secretaria de Comunicação Social do Conselho Nacional de Justiça: Sumário executivo do relatório justiça em números 2020 (2018). https://www.cnj.jus.br/wp-content/uploads/2020/08/WEB_V2_SUMARIO_EXECUTIVO_CNJ_JN2020.pdf
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Smith, L.N.: Cyclical learning rates for training neural networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 464–472 (2017). https://doi.org/10.1109/WACV.2017.58
Smith, L.N., Topin, N.: Super-convergence: Very fast training of neural networks using large learning rates. CoRR abs/1708.07120 (2017). http://arxiv.org/abs/1708.07120
Smith, R.: An overview of the Tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR), vol. 2, pp. 629–633. IEEE (2007)
Supremo Tribunal Federal: Ministra Cármen Lúcia anuncia início de funcionamento do Projeto Victor, de inteligência artificial (2018). http://www.stf.jus.br/portal/cms/verNoticiaDetalhe.asp?idConteudo=388443
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.u., Polosukhin, I.: Attention is all you need. In: I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, R. Garnett (eds.) Advances in Neural Information Processing Systems 30, pp. 5998–6008. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf
Wiedemann, G., Heyer, G.: Multi-modal page stream segmentation with convolutional neural networks. Language Res. Evalu. (2019). https://doi.org/10.1007/s10579-019-09476-2
Article Google Scholar
Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: Pre-Training of Text and Layout for Document Image Understanding, p. 1192–1200. Association for Computing Machinery, New York, NY, USA (2020). https://doi.org/10.1145/3394486.3403172
Xu, Y., Lv, T., Cui, L., Wang, G., Lu, Y., Florencio, D., Zhang, C., Wei, F.: Layoutxlm: Multimodal pre-training for multilingual visually-rich document understanding (2021)
Xu, Y., Xu, Y., Lv, T., Cui, L., Wei, F., Wang, G., Lu, Y., Florencio, D., Zhang, C., Che, W., Zhang, M., Zhou, L.: LayoutLMv2: Multi-modal pre-training for visually-rich document understanding. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 2579–2591. Association for Computational Linguistics, Online (2021). https://doi.org/10.18653/v1/2021.acl-long.201

Download references

Funding

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001. TdC received support from Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), grant PQ 314154/2018-3. We acknowledge the support of “Projeto de Pesquisa & Desenvolvimento de aprendizado de máquina (machine learning) sobre dados judiciais das repercussões gerais do Supremo Tribunal Federal - STF”. We are also grateful for the support from Fundação de Apoio à Pesquisa do Distrito Federal (FAPDF, project KnEDLe, convênio 07/2019) and Fundação de Empreendimentos Científicos e Tecnológicos (Finatec). TdC is currently on a leave of absence from the University of Brasilia and works at Vicon Motion Systems, Oxford Metrics Group.

Author information

Authors and Affiliations

Department of Computer Science, Universidade de Brasília, 70910-900, Brasília, Brazil
Pedro H. Luz de Araujo, Flavio de Barros Vidal & Teofilo E. de Campos
Department of Mechanical Engineering, Universidade de Brasília, 70910-900, Brasília, Brazil
Ana Paula G. S. de Almeida
Gama Faculty, University of Brasilia, Campus Gama - Setor Leste - Gama, 72444-240, Brasília, Brazil
Fabricio Ataides Braz & Nilton Correia da Silva

Authors

Pedro H. Luz de Araujo
View author publications
You can also search for this author in PubMed Google Scholar
Ana Paula G. S. de Almeida
View author publications
You can also search for this author in PubMed Google Scholar
Fabricio Ataides Braz
View author publications
You can also search for this author in PubMed Google Scholar
Nilton Correia da Silva
View author publications
You can also search for this author in PubMed Google Scholar
Flavio de Barros Vidal
View author publications
You can also search for this author in PubMed Google Scholar
Teofilo E. de Campos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro H. Luz de Araujo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Luz de Araujo, P.H., de Almeida, A.P.G.S., Ataides Braz, F. et al. Sequence-aware multimodal page classification of Brazilian legal documents. IJDAR 26, 33–49 (2023). https://doi.org/10.1007/s10032-022-00406-7

Download citation

Received: 08 April 2021
Revised: 31 December 2021
Accepted: 03 June 2022
Published: 12 July 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10032-022-00406-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Sequence-aware multimodal page classification of Brazilian legal documents

Abstract

Access this article

Similar content being viewed by others

Automating petition classification in Brazil’s legal system: a two-step deep learning approach

Multimodal Deep Networks for Text and Image-Based Document Classification

TransDocAnalyser: A Framework for Semi-structured Offline Handwritten Documents Analysis with an Application to Legal Domain

Data availability

Code Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sequence-aware multimodal page classification of Brazilian legal documents

Abstract

Access this article

Similar content being viewed by others

Automating petition classification in Brazil’s legal system: a two-step deep learning approach

Multimodal Deep Networks for Text and Image-Based Document Classification

TransDocAnalyser: A Framework for Semi-structured Offline Handwritten Documents Analysis with an Application to Legal Domain

Data availability

Code Availability

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation