A Novel Approach of Augmenting Training Data for Legal Text Segmentation by Leveraging Domain Knowledge

  • Rupali Sunil WaghEmail author
  • Deepa Anand
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 910)


In this era of information overload, text segmentation can be used effectively to locate and extract information specific to users’ need within the huge collection of documents. Text segmentation refers to the task of dividing a document into smaller labeled text fragments according to the semantic commonality of the contents. Due to the presence of rich semantic information in legal text, text segmentation becomes very crucial in legal domain for information retrieval. But such supervised classification requires huge training data for building efficient classifier. Collecting and manually annotating gold standards in NLP is very expensive. In recent past the question of whether we can satisfactorily replace them with automatically annotated data is arising more and more interest. This work presents two approaches entirely based in domain knowledge for automatic generation of training data which can further be used for segmentation of court judgments.


Natural language processing Legal text segmentation Legal information retrieval Supervised learning Generation of training dataset 


  1. 1.
    A History of Artificial Intelligence and Law: 25 year of the international conference on AI and Law. Artif. Intell. Law. Springer Science+Business Media B.V. (2012)CrossRefGoogle Scholar
  2. 2.
    Kumar, S., Reddy, P.K., Reddy, V.B., Singh, A.: Similarity analysis of legal judgments. In: Proceedings of the Fourth Annual ACM Bangalore Conference on—COMPUTE 11 (2011)Google Scholar
  3. 3.
    AI and Justice/Legal information systemsGoogle Scholar
  4. 4.
    Verheij, B.: Formalizing correct evidential reasoning with arguments, scenarios and probabilities. In: Workshop at the 22nd European Conference on Artificial Intelligence (2016)Google Scholar
  5. 5.
    Falakmasir Mohammad, H., Ashley Kevin, D.: Utilizing vector space models for identifying legal factors from text. In: Legal Knowledge and Information Systems. IOS Press (2017)Google Scholar
  6. 6.
    Wyner, A., Shulayeva, O., Siddharthan, A.: Recognizing cited facts and legal principles in judgments. Artif. Intell. Law (2017) (Springer)Google Scholar
  7. 7.
    Kanapala, A., Pal, S., Pamula, R.: Text summarization from legal documents: a survey. Artif. Intell. Rev. (2017).
  8. 8.
    Jia, J., Miratrix, L., Yu, B., Gawalt, B., El Ghaoui, L., Barnesmoore, L., Clavier, S.: Concise comparative summaries (CCS) of large text corpora with a human experiment. Ann. Appl. Stat. 8(1), 499–529 (2014)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Shastri, L.: System and method for identifying text in legal document for preparing headnote. United States Patent US9058308B2, 2015Google Scholar
  10. 10.
    Judgment Writing, Delivered by the Honourable Justice Roslyn Atkinson, Supreme Court of Queensland, to the AIJA Conference, Brisbane, 13 Sept 2002Google Scholar
  11. 11.
    Walker, V.R., Han, J.H., Ni, X., Yoseda, K.: Semantic types for computational legal reasoning: propositional connectives and sentence roles in the veterans’ claims dataset. In Proceedings of the 16th International Conference on Artificial Intelligence and Law, London, UK, June 2017 (ICAIL ’17), 10 p (2017)Google Scholar
  12. 12.
    Yamada, H., Teufel, S., Tokunaga, T.: Designing an annotation scheme for summarizing Japanese judgment documents. In: 2017 9th International Conference on Knowledge and Systems Engineering.
  13. 13.
    Saravanan, M., Ravindran, B., Raman, S.: Improving legal document summarization using graphical models. JURIX (2006)Google Scholar
  14. 14.
    Yamada, H., Teufel, S., Tokunaga, T.: Annotation of argument structure in Japanese legal documents. In: Proceedings of the 4th Workshop on Argument Mining, pp. 22–31, Copenhagen, Denmark, 8 Sept 2017. Association for Computational LinguisticsGoogle Scholar
  15. 15.
    Wyner, A., Gough, F., Levy, F., Lynch, M., Nazarenko, A.: On annotation of the textual contents of Scottish legal instruments. In: Legal Knowledge and Information Systems. IOS Press (2017)Google Scholar
  16. 16.
    Nejadgholi, I., Bougueng, R., Witherspoon, S.: A semi-supervised training method for semantic search of legal facts in Canadian immigration cases. In: Legal Knowledge and Information Systems. IOS Press (2017).
  17. 17.
    judic.nic.inGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Jain UniversityBangaloreIndia
  2. 2.CMR Institute of TechnologyBangaloreIndia

Personalised recommendations