Skip to main content

Predicting applicable law sections from judicial case reports using legislative text analysis with machine learning


This paper presents a study on legislative text analysis to automate the process of identifying appropriate sections of laws that are applicable to the cases. We propose a methodology that includes supervised machine learning (ML) and natural language processing (NLP), and demonstrated our idea on the archived case studies of Indian Income Tax Act of 1963 (Income tax act, 1961 complete act—bare act, 2008), with applicable law sections and subsections, available at ‘LegalCrystal’ ( data repository. We consider the problem as a multi-label classification task, where multiple law sections could be applied on one case. The one-versus-rest wrapper is applied over the conventional ML models like logistic regression, Naïve bayes, decision tree and support vector machine to perform the multi-label classification. The proposed methodology includes necessary preprocessing and word embedding of texts, pipelining of transformers and ML models and evaluation of the trained models. We analyzed the performance of these different ML models by fine-tuning the hyper-parameters and observed a highest F1 score of 0.75 for support vector machine. Although this work is limited to cases involving income tax laws, the proposed methodology is adaptive to any other law sections.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. Income tax act. (2008). 1961 complete act. bare act.

  2. National judicial data grid.

  3. LegalCrystal website.

  4. Surden, H. (2014). Machine learning and law. Washington Law Review, 89, 87.

    Google Scholar 

  5. Virtucio, M. B. L., Aborot, J. A., Abonita, J. K. C., Avinante, R. S., Copino, R. J. B., Neverida, M. P., & Tan, G. B. A. (2018). Predicting decisions of the Philippine Supreme Court using natural language processing and machine learning. In 2018 IEEE 42nd annual computer software and applications conference (COMPSAC) (Vol. 2, pp. 130–135). IEEE.

  6. Francesconi, E., & Passerini, A. (2007). Automatic classification of provisions in legislative texts. Artificial Intelligence and Law, 15(1), 1–17.

    Article  Google Scholar 

  7. Islam, M. A., & Haque, M. J. (2018). Evaluating document analysis with KNN based approaches in judicial offices of Bangladesh. In 2018 second international conference on computing methodologies and communication (ICCMC) (pp. 646–650). IEEE.

  8. Liu, Z., & Chen, H. (2017). A predictive performance comparison of machine learning models for judicial cases. In 2017 IEEE symposium series on computational intelligence (SSCI) (pp. 1–6). IEEE.

  9. Waltl, B., Bonczek, G., Scepankova, E., Landthaler, J., & Matthes, F. (2017). Predicting the outcome of appeal decisions in Germany’s tax law. In International conference on electronic participation (pp. 89–99). Cham: Springer.

  10. Aletras, N., Tsarapatsanis, D., Preoţiuc-Pietro, D., & Lampos, V. (2016). Predicting judicial decisions of the European Court of Human Rights: A natural language processing perspective. PeerJ Computer Science, 2, e93.

    Article  Google Scholar 

  11. Medvedeva, M., Vols, M., & Wieling, M. (2020). Using machine learning to predict decisions of the European Court of Human Rights. Artificial Intelligence and Law, 28(2), 237–266.

    Article  Google Scholar 

  12. Richardson, L. (2020). Beautiful soup documentation.

  13. Xu, J. (2011). An extended one-versus-rest support vector machine for multi-label classification. Neurocomputing, 74(17), 3114–3124.

    Article  Google Scholar 

  14. Goldberg, Y., & Levy, O. (2014). word2vec Explained: Deriving Mikolov et al.'s negative-sampling word-embedding method. arXiv:1402.3722.

  15. Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).

  16. Ketkar, N. (2017). Introduction to keras. In Deep learning with Python (pp. 97–111). Berkeley, CA: Apress

  17. Sorower, M. S. (2010). A literature survey on algorithms for multi-label learning. Oregon State University, Corvallis, 18, 1–25.

    Google Scholar 

  18. Godbole, S., & Sarawagi, S. (2004). Discriminative methods for multi-labeled classification. In Pacific-Asia conference on knowledge discovery and data mining (pp. 22–30). Berlin: Springer.

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Souvik Sengupta.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sengupta, S., Dave, V. Predicting applicable law sections from judicial case reports using legislative text analysis with machine learning. J Comput Soc Sc 5, 503–516 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Natural language processing
  • Machine learning
  • Multi-label classification
  • Legal text analysis