Using Topic Modeling in Classification of Brazilian Lawsuits

Aguiar, André; Silveira, Raquel; Furtado, Vasco; Pinheiro, Vládia; Neto, João A. Monteiro

doi:10.1007/978-3-030-98305-5_22

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13208))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

874 Accesses
4 Citations

Abstract

Legal text processing is a challenging task for modeling approaches due to the peculiarities inherent to its features, such as long texts and their technical vocabulary. Topic modeling consists of discovering a semantic structure in the text. This paper investigates the application of topic modeling and the use of information about the legislation cited in identifying the subject of legal documents and evaluating its applicability in the classification of Brazilian lawsuits. The models were trained with a Golden Collection of 16 thousand initial petitions and indictments from the Court of Justice of the State of Ceará, in Brazil, whose lawsuits were classified in the five more representative National Council of Justice (CNJ) of Brazil classes - Common Civil Procedure, Execution of Extrajudicial Title, Criminal Action - Ordinary Procedure, Special Civil Court Procedure, and Tax Enforcement. The results obtained outperform the baseline, achieving 0.89 of F1 score (macro). Our interpretation is that the representation of the document through contextual embeddings generated by BERT, as well as the architecture of the model with bidirectional contexts, makes it possible to capture the specific context of the domain of legal documents. Thus, the use of the legislation mentioned in the representation of documents can improve the accuracy of the classification task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Text Classification in Legal Documents Extracted from Lawsuits in Brazilian Courts

A topic discovery approach for unsupervised organization of legal document collections

Article 19 July 2023

Topic Modelling of Legal Texts Using Bidirectional Encoder Representations from Sentence Transformers

Notes

1.
https://www.cnj.jus.br/sgt/consulta_publica_classes.php.
2.
The lemmatization process and PoS tagging were based on what is available in the spaCy library for Portuguese language (https://spacy.io/).
3.
https://xgboost.readthedocs.io/en/latest/python/python_api.html.

References

Angelov, D.: Top2Vec: Distributed Representations of Topics. arXiv:2008.09470v1 (2020)
Grootendorst, M.: BERTopic: leveraging BERT and c-TF-IDF to create easily interpretable topics (2020). https://doi.org/10.5281/zenodo.4381785
Remmits, Y.: Finding the Topics of Case Law: Latent Dirichlet Allocation on Supreme Court Decisions, Thesis. Radboad Universiteit (2017)
Google Scholar
Araújo, P.H.L., Campos, T.: Topic Modelling Brazilian Supreme Court Lawsuits. JURI SAYS, vol. 113 (2020)
Google Scholar
Neill, J.O., Robin, C., Brien, L.O., Buitelaar, P.: An Analysis of Topic Modelling for Legislative Texts. ASAIL 2017, London, UK (2017)
Google Scholar
Devlin, J., Chang, Ming-Wei, Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics (2019)
Google Scholar
Sumner, C., Byers, A., Boochever, R., Park, G.J.: Predicting dark triad personality traits from Twitter usage and a linguistic analysis of Tweets. In: Proceedings of ICMLA (2012). https://doi.org/10.1109/ICMLA.2012.218
Pérez-Rosas, V., Mihalcea, R.: Experiments in open domain deception detection. In: Màrquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) Proceedings of EMNLP. Association for Computational Linguistics (2015). http://aclweb.org/anthology/D/D15/D15-1133.pdf
Pinheiro, V., Pequeno, T., Furtado, V., Nogueira, D.: Information extraction from text based on semantic inferentialism. In: Andreasen, T., Yager, R.R., Bulskov, H., Christiansen, H., Larsen, H.L. (eds.) FQAS 2009. LNCS (LNAI), vol. 5822, pp. 333–344. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04957-6_29
Chapter Google Scholar
Justin, C., Cristian, D.-N.-M., Jure, L.: Anti-social behavior in online discussion communities. In: Proceedings of ICWSM (2015)
Google Scholar
Katz, D.M., Bommarito, I.I., Michael, J.I., Blackman, J.: Predicting the Behavior of the Supreme Court of the United States: A General Approach. arXiv:1407.6333 (2014)
Aletras, N., Tsarapatsanis, D., Preotiuc-Pietro, D., Lampos, V.: Predicting judicial decisions of the european court of human rights: a natural language processing perspective. PeerJ Comput. Sci. 10 (2016)
Google Scholar
Sulea, O. M., Zampieri, M., Vela, M., vanGenabith, J.: Predicting the law area and decisions of French Supreme Court cases. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP, pp. 716–722. INCOMA Ltd. (2017)
Google Scholar
Araújo, P.H.L., Campos, T.E., Braz, F.A., Silva, N.C.: VICTOR: a dataset for Brazilian legal documents classification. In: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), pp. 1449–1458. Marseille (2020)
Google Scholar
Neogi, P.P.G., Das, A.K., Goswami, S., Mustafi, J.: Topic modeling for text classification. In: Mandal, J.K., Bhattacharya, D. (eds.) Emerging Technology in Modelling and Graphics. AISC, vol. 937, pp. 395–407. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-7403-6_36
Chapter Google Scholar
Ge, J., Lin, S., Fang, Y.: A Text classification algorithm based on topic model and convolutional neural network. J. Phys.: Conf. Ser. 1748, 032036 (2021). https://doi.org/10.1088/1742-6596/1748/3/032036
Luz de Araujo, P.H., de Campos, T.E., de Oliveira, R.R.R., Stauffer, M., Couto, S., Bermejo, P.: LeNER-Br: a dataset for named entity recognition in brazilian legal text. In: Villavicencio, A., et al. (eds.) PROPOR 2018. LNCS (LNAI), vol. 11122, pp. 313–323. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99722-3_32
Chapter Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks (2019). https://arxiv.org/pdf/1908.10084.pdf
McInnes, L., Healy, J.: UMAP: Uniform manifold approximation and projection for dimension reduction, J. Open Source Softw. 3(29), 861 (2018). arXiv:1802.03426 (2018)
McInnes, L., Healy, J., Astels, S.: hdbscan: hierarchical density based clustering. J. Open Source Softw. 2(11), 205 (2017). https://doi.org/10.21105/joss.00205
Article Google Scholar
Aguiar, A., Silveira, R., Pinheiro, V., Furtado, V., Neto, J.A.: Text classification in legal documents extracted from lawsuits in brazilian courts. In: Britto, A., Valdivia Delgado, K. (eds.) BRACIS 2021. LNCS (LNAI), vol. 13074, pp. 586–600. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-91699-2_40
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

University of Fortaleza, Fortaleza, Brazil
André Aguiar, Vasco Furtado, Vládia Pinheiro & João A. Monteiro Neto
Federal Institute of Education, Science and Technology of Ceará, Fortaleza, Brazil
Raquel Silveira
ETICE - Information Technology Company of Ceará, Fortaleza, Brazil
Vasco Furtado

Authors

André Aguiar
View author publications
You can also search for this author in PubMed Google Scholar
Raquel Silveira
View author publications
You can also search for this author in PubMed Google Scholar
Vasco Furtado
View author publications
You can also search for this author in PubMed Google Scholar
Vládia Pinheiro
View author publications
You can also search for this author in PubMed Google Scholar
João A. Monteiro Neto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to André Aguiar .

Editor information

Editors and Affiliations

Universidade de Fortaleza, Fortaleza, Brazil
Vládia Pinheiro
CiTIUS - Universidade de Santiago de Compostela, Santiago de Compostela, Spain
Pablo Gamallo
Universidade Nova de Lisboa, Lisbon, Portugal
Raquel Amaro
University of Sheffield, Sheffield, UK
Carolina Scarton
INESC-ID, Lisbon, Portugal
Fernando Batista
Federal University of São Carlos, São Carlos, Brazil
Diego Silva
University of Lisbon, Lisbon, Portugal
Catarina Magro
Sentimonitor, Porto Alegre, Brazil
Hugo Pinto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aguiar, A., Silveira, R., Furtado, V., Pinheiro, V., Neto, J.A.M. (2022). Using Topic Modeling in Classification of Brazilian Lawsuits. In: Pinheiro, V., et al. Computational Processing of the Portuguese Language. PROPOR 2022. Lecture Notes in Computer Science(), vol 13208. Springer, Cham. https://doi.org/10.1007/978-3-030-98305-5_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-98305-5_22
Published: 16 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98304-8
Online ISBN: 978-3-030-98305-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Using Topic Modeling in Classification of Brazilian Lawsuits

Abstract

Access this chapter

Similar content being viewed by others

Text Classification in Legal Documents Extracted from Lawsuits in Brazilian Courts

A topic discovery approach for unsupervised organization of legal document collections

Topic Modelling of Legal Texts Using Bidirectional Encoder Representations from Sentence Transformers

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Using Topic Modeling in Classification of Brazilian Lawsuits

Abstract

Access this chapter

Similar content being viewed by others

Text Classification in Legal Documents Extracted from Lawsuits in Brazilian Courts

A topic discovery approach for unsupervised organization of legal document collections

Topic Modelling of Legal Texts Using Bidirectional Encoder Representations from Sentence Transformers

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation