Detecting Sections and Entities in Court Decisions Using HMM and CRF Graphical Models

  • Gildas Tagny NgompéEmail author
  • Sébastien Harispe
  • Guillaume Zambrano
  • Jacky Montmain
  • Stéphane Mussard
Part of the Studies in Computational Intelligence book series (SCI, volume 834)


Court decisions are legal documents that undergo careful analysis by lawyers in order to understand how judges make decisions. Such analyses can indeed provide invaluable insight into application of the law for the purpose of conducting many types of studies. As an example, a decision analysis may facilitate the handling of future cases and detect variations in judicial decision-making with respect to specific variables, like court location. This paper presents a set of results and lessons learned during a project intended to address a number of challenges related to searching and analyzing a large body of French court decisions. In particular, this paper focuses on a concrete and detailed application of the HMM and CRF sequence labeling models for the tasks of: (i) sectioning decisions, and (ii) detecting entities of interest in their content (e.g. locations, dates, participants, rules of law). The effect of several key design and fine-tuning features is studied for both task categories. Moreover, the present study covers steps that often receive little discussion yet remain critical to the practical application of sequence labeling models, i.e.: candidate feature definition, selection of good feature subsets, segment representations, and impact of the training dataset size on model performance.


  1. Balikas, G., Partalas, I., & Amin, M. -R. (July 2017). On the effectiveness of feature set augmentation using clusters of word embeddings. In Proceedings of ACM Conference, Washington, DC, USA, (p. 5).Google Scholar
  2. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022.Google Scholar
  3. Bontcheva, K., Cunningham, H., Roberts, I., Roberts, A., Tablan, V., Aswani, N., et al. (2013). Gate teamware: A web-based, collaborative text annotation framework. Language Resources and Evaluation, 47(4), 1007–1029.CrossRefGoogle Scholar
  4. Cardellino, C., & Teruel, M., et al. (2017). A low-cost, high-coverage legal named entity recognizer, classifier and linker. In Proceedings of the 16th edition of the International Conference on Articial Intelligence and Law (pp. 9–18). ACM.Google Scholar
  5. Chang, Y. -S., & Sung, Y. -H. (2005). Applying name entity recognition to informal text. Stanford CS224N/Ling237 Final Project Report.Google Scholar
  6. Chau, M., Xu, J. J., & Chen, H. (2002). Extracting meaningful entities from police narrative reports. In Proceedings of the 2002 Annual National conference on Digital Government Research. Digital Government Society of North America.Google Scholar
  7. Chiticariu, L., Krishnamurthy, R., Li, Y., Reiss, F., & Vaithyanathan, S. (2010). Domain adaptation of rule-based annotators for named-entity recognition tasks. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (pp. 1002–1012). Association for Computational Linguistics.Google Scholar
  8. Cretin, L. (2014). L’opinion des français sur la justice. INFOSTAT JUSTICE, 125.Google Scholar
  9. Dozier, C., Kondadadi, R., Light, M., Vachher, A., Veeramachaneni, S., & Wudali, R. (2010). Named entity recognition and resolution in legal text. In Semantic Processing of Legal Texts (pp. 27–43). Springer.Google Scholar
  10. Finkel, J. R., Grenager, T., & Manning, C. (2005). Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 363–370). Association for Computational Linguistics.Google Scholar
  11. Galliano, S., Gravier, G., & Chaubard, L. (2009). The ester 2 evaluation campaign for the rich transcription of French radio broadcasts. In Tenth Annual Conference of the International Speech Communication Association.Google Scholar
  12. Guo, H., & Zhu, H., et al. (2009). Domain adaptation with latent semantic association for named entity recognition. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 281–289).Google Scholar
  13. Hanisch, D., & Fundel, K., et al. (2005). Prominer: Rule-based protein and gene entity recognition. BMC Bioinformatics, 6(1), S14.Google Scholar
  14. Huang, Z., Xu, W., & Yu, K. (2015). Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991.
  15. Konkol, M., & Konopík, M. (2015). Segment representations in named entity recognition. In International Conference on Text, Speech, and Dialogue (pp. 61–70). Springer.Google Scholar
  16. Kríž, V., Hladká, B., et al. (2014). Statistical Recognition of References in Czech Court Decisions (pp. 51–61). Cham: Springer International Publishing.Google Scholar
  17. Lafferty, J., McCallum, A., & Pereira, F. C. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. International Conference on Machine Learning.Google Scholar
  18. Lam, H.-P., Hashmi, M., & Scofield, B. (2016). Enabling reasoning with legalruleml. In International Symposium on Rules and Rule Markup Languages for the Semantic Web (pp. 241–257). Springer.Google Scholar
  19. Lample, G., & Ballesteros, M., et al. (2016). Neural architectures for named entity recognition. arXiv:1603.01360.
  20. Li, Y., Zaragoza, H., Herbrich, R., Shawe-Taylor, J., & Kandola, J. (2002). The perceptron algorithm with uneven margins. ICML, 2, 379–386.Google Scholar
  21. Liu, D. C., & Nocedal, J. (1989). On the limited memory BFGS method for large scale optimization. Mathematical Programming, 45(1), 503–528.Google Scholar
  22. Liu, H., & Motoda, H. (2012). Feature selection for knowledge discovery and data mining, volume 454. Springer Science & Business Media.Google Scholar
  23. Ma, X., & Hovy, E. (2016). End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. arXiv:1603.01354.
  24. Marrero, M., Urbano, J., et al. (2013). Named entity recognition: Fallacies, challenges and opportunities. Computer Standards & Interfaces, 35(5), 482–489.CrossRefGoogle Scholar
  25. McCallum, A. K. (2002). MALLET: A Machine Learning for Language Toolkit.
  26. McCallum, A. K., Nigam, K., et al. (2000). Automating the construction of internet portals with machine learning. Information Retrieval, 3(2), 127–163.CrossRefGoogle Scholar
  27. Mikheev, A., Moens, M., & Grover, C. (1999). Named entity recognition without gazetteers. In Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics (pp. 1–8). Association for Computational Linguistics.Google Scholar
  28. Nallapati, R., Surdeanu, M., & Manning, C. (2010). Blind domain transfer for named entity recognition using generative latent topic models. In Proceedings of the NIPS 2010 Workshop on Transfer Learning Via Rich Generative Models (pp. 281–289).Google Scholar
  29. Palmer, D. D., & Day, D. S. (1997). A statistical profile of the named entity task. In Proceedings of the Fifth Conference on Applied Natural Language Processing (pp. 190–193). Association for Computational Linguistics.Google Scholar
  30. Persson, C. (2012). Machine Learning for Tagging of Biomedical Literature. Closing project report, Technical University of Denmark, DTU Informatics.Google Scholar
  31. Petrillo, M., & Baycroft, J. (2010). Introduction to manual annotation. Fairview Research.Google Scholar
  32. Plamondon, L., Lapalme, G., & Pelletier, F. (2004). Anonymisation de décisions de justice. In XIe Conférence sur le Traitement Automatique des Langues Naturelles (TALN 2004) (pp. 367–376).Google Scholar
  33. Polifroni, J., & Mairesse, F. (2011). Using latent topic features for named entity extraction in search queries. INTERSPEECH, 2129–2132.Google Scholar
  34. Pudil, P., Novovičová, J., & Kittler, J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15(11), 1119–1125.CrossRefGoogle Scholar
  35. Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.CrossRefGoogle Scholar
  36. Raman, B., & Ioerger, T. R. (2003). Enhancing learning using feature and example selection. College Station, TX, USA: Texas A&M University.Google Scholar
  37. Rosset, S., Grouin, C., & Zweigenbaum, P. (2011). Entités nommées structurées: guide d’annotation Quaero. LIMSI-Centre national de la recherche scientifique.Google Scholar
  38. Schmid, H. (2013). Probabilistic part-of-speech tagging using decision trees. In New methods in language processing (pp. 154). Routledge.Google Scholar
  39. Siniakov, P. (2008). GROPUS an Adaptive Rule-based Algorithm for Information Extraction. PhD thesis, Freie Universität Berlin.Google Scholar
  40. Surdeanu, M., Nallapati, R., & Manning, C. (2010). Legal claim identification: Information extraction with hierarchically labeled data. In Proceedings of the LREC 2010 Workshop on the Semantic Processing of Legal Texts.Google Scholar
  41. Tellier, I., Dupont, Y., & Courmet, A. (2012). Un segmenteur-étiqueteur et un chunker pour le Français. JEP-TALN-RECITAL 2012, page 7.Google Scholar
  42. Tjong Kim Sang, E. F., & De Meulder, F. (2003). Introduction to the CONLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4, CONLL ’03, pp. 142–147, Stroudsburg, PA, USA. Association for Computational Linguistics.Google Scholar
  43. Viera, A. J., Garrett, J. M., et al. (2005). Understanding interobserver agreement: The kappa statistic. Fam Med, 37(5), 360–363.Google Scholar
  44. Viterbi, A. J. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Transactions on Information Theory, 13(2), 260–269.CrossRefGoogle Scholar
  45. Wallach, H. M. (2004). Conditional random fields: An introduction. University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-04-21.Google Scholar
  46. Welch, L. R. (2003). Hidden Markov models and the Baum-Welch algorithm. IEEE Information Theory Society Newsletter, 53(4), 10–13.Google Scholar
  47. Witten, I. H., & Bray, Z., et al. (1999). Using language models for generic entity extraction. In Proceedings of the ICML Workshop on Text Mining.Google Scholar
  48. Wu, Y., Zhao, J., & Xu, B. (2003). Chinese named entity recognition combining a statistical model with human knowledge. In Proceedings of the ACL 2003 workshop on Multilingual and mixed-language named entity recognition-Volume 15 (pp. 65–72). Association for Computational Linguistics.Google Scholar
  49. Wyner, A., & Peters, W. (2012). Semantic annotations for legal text processing using GATE Teamware. In Semantic Processing of Legal Texts (SPLeT-2012) Workshop Programme p. 34.Google Scholar
  50. Xiao, R. (2010). Handbook of natural language processing, chapter 7 - Corpus Creation, pp. 146–165. Chapman and Hall, second edition.Google Scholar
  51. Zhu, X. (2010). Conditional random fields. CS769 Spring 2010 Advanced Natural Language Processing.

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Gildas Tagny Ngompé
    • 1
    Email author
  • Sébastien Harispe
    • 1
  • Guillaume Zambrano
    • 2
  • Jacky Montmain
    • 1
  • Stéphane Mussard
    • 2
  1. 1.LGI2P, IMT Mines AlèsAlèsFrance
  2. 2.CHROME EA 7352, Université de NîmesNîmesFrance

Personalised recommendations