Artificial Intelligence and Law

, Volume 27, Issue 2, pp 227–251 | Cite as

Semi-automatic knowledge population in a legal document management system

  • Guido Boella
  • Luigi Di CaroEmail author
  • Valentina Leone


Every organization has to deal with operational risks, arising from the execution of a company’s primary business functions. In this paper, we describe a legal knowledge management system which helps users understand the meaning of legislative text and the relationship between norms. While much of the knowledge requires the input of legal experts, we focus in this article on NLP applications that semi-automate essential time-consuming and lower-skill tasks—classifying legal documents, identifying cross-references and legislative amendments, linking legal terms to the most relevant definitions, and extracting key elements of legal provisions to facilitate clarity and advanced search options. The use of Natural Language Processing tools to semi-automate such tasks makes the proposal a realistic commercial prospect as it helps keep costs down while allowing greater coverage.


  1. Ajani G, Boella G, Caro L, Robaldo L, Humphreys L, Praduroux S, Rossi P, Violato A (2016) The European Taxonomy Syllabus: a multi-lingual, multi-level ontology framework to untangle the web of european legal terminology. Appl Ontol 11(4):325–375CrossRefGoogle Scholar
  2. Ajani G, Lesmo L, Boella G, Mazzei A, Rossi P (2007) Terminological and ontological analysis of european directives: multilinguism in law. In: Proceedings of the 11th international conference on artificial intelligence and law: ICAIL. ACM, pp 43–48Google Scholar
  3. Berland M, Charniak E (1999) Finding parts in very large corpora. In: Annual meeting association for computational linguistics, vol 37. Association for Computational Linguistics, pp 57–64Google Scholar
  4. Biagioli C, Francesconi E, Passerini A, Montemagni S, Soria C (2005) Automatic semantics extraction in law documents. In: Proceedings of the tenth international conference on artificial intelligence and law: ICAIL. ACM, pp 133–140Google Scholar
  5. Biemann C (2005) Ontology learning from text: a survey of methods. LDV Forum 20:75–93Google Scholar
  6. Boella G, Di Caro L, Graziadei M, Cupi L, Salaroglio CE, Humphreys L, Konstantinov H, Marko K, Robaldo L, Ruffini C et al (2015) Linking legal open data: breaking the accessibility and language barrier in european legislation and case law. In: Proceedings of the 15th international conference on artificial intelligence and law. ACM, pp 171–175Google Scholar
  7. Boella G, Di Caro L, Humphreys L, Robaldo L, van der Torre L (2012) Nlp challenges for eunomos, a tool to build and manage legal knowledge. In: Language resources and evaluation (LREC), pp 3672–3678Google Scholar
  8. Boella G, Di Caro L, Robaldo L (2013) Semantic relation extraction from legislative text using generalized syntactic dependencies and support vector machines. In: International workshop on rules and rule markup languages for the semantic web. Springer, pp 218–225Google Scholar
  9. Bosco C, Montemagni A, Mazzei A, Lombardo V, Dell’Orletta F, Lenci A, Lesmo L, Attardi G, Simi M, Lavelli A, Hall J, Nilsson J, Nivre J (2010) Comparing italian parsers on a common treebank: the evalita experience. In: Proceedings of the 6th international conference on language resources and evaluation (LREC 2010)Google Scholar
  10. Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37(9):1757–1771CrossRefGoogle Scholar
  11. Buitelaar P, Cimiano P, Magnini B (2005) Ontology learning from text: an overview. Ontol Learn Text Methods Eval Appl 123:3–12Google Scholar
  12. Candan KS, Di Caro L, Sapino ML (2008) Creating tag hierarchies for effective navigation in social media. In: Proceedings of the 2008 ACM workshop on search in social media. ACM, pp 75–82Google Scholar
  13. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297zbMATHGoogle Scholar
  14. de Maat E, Krabben K, Winkels R (2010a) Machine learning versus knowledge based classification of legal texts. In: Proceedings of legal knowledge and information systems conference: JURIX 2010, pp 87–96Google Scholar
  15. de Maat E, Krabben K, Winkels R (2010b) Machine learning versus knowledge based classification of legal texts. In: Proceedings of legal knowledge and information systems conference: JURIX 2010. IOS Press, pp 87–96Google Scholar
  16. Del Gaudio R, Branco A (2007) Automatic extraction of definitions in Portuguese: a rule-based approach. In: Progress in artificial intelligence, pp 659–670Google Scholar
  17. Di Caro L, Candan KS, Sapino ML (2008) Using tagflake for condensing navigable tag hierarchies from tag clouds. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1069–1072Google Scholar
  18. Di Caro L, Candan KS, Sapino ML (2011) Navigating within news collections using tag-flakes. J Vis Lang Comput 22(2):120–139CrossRefGoogle Scholar
  19. Diplaris S, Tsoumakas G, Mitkas P, Vlahavas I (2005) Protein classification with multiple algorithms. In: Bozanis P, Houstis EN (eds) Advances in informatics. PCI 2005. Lecture notes in computer science, vol 3746. Springer, BerlinGoogle Scholar
  20. Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874zbMATHGoogle Scholar
  21. Fortuna B, Mladenič D, Grobelnik M (2006) Semi-automatic construction of topic ontologies. In: Ackermann M et al (eds) Semantics, web and mining. EWMF 2005, KDO 2005. Lecture notes in computer science, vol 4289. Springer, BerlinGoogle Scholar
  22. Harris Z (1954) Distributional structure. Word 10(23):146–162CrossRefGoogle Scholar
  23. Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th conference on Computational linguistics-volume 2. Association for Computational Linguistics, pp 539–545Google Scholar
  24. Ittoo A, Bouma G (2013) Minimally-supervised extraction of domain-specific part-whole relations using wikipedia as knowledge-base. Data Knowl Eng 85:57–79CrossRefGoogle Scholar
  25. Lauser B, Hotho A (2003) Automatic multi-label subject indexing in a multilingual environment. In: Koch T, Sølvberg IT (eds) Research and advanced technology for digital libraries. ECDL 2003. Lecture Notes in Computer Science, vol 2769. Springer, Berlin, pp 140–151Google Scholar
  26. Lesmo L (2007) The rule-based parser of the NLP group of the University of Torino. Intell Artif 2(4):46–47Google Scholar
  27. Lesmo L (2009) The turin university parser at evalita 2009. In: Proceedings of EVALITA, p 9Google Scholar
  28. Lesmo L, Mazzei A, Palmirani M, Radicioni DP (2013) Tulsi: an nlp system for extracting legal modificatory provisions. Artif Intell Law 21(2):139–172CrossRefGoogle Scholar
  29. Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41CrossRefGoogle Scholar
  30. Moschitti A, Bejan CA (2004) A semantic kernel for predicate argument classification. In: CoNLL-2004Google Scholar
  31. Navigli R, Velardi P (2010) Learning word-class lattices for definition and hypernym extraction. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala, Sweden. Association for Computational Linguistics, pp 1318–1327Google Scholar
  32. Ponzetto SP, Strube M (2007) Deriving a large scale taxonomy from wikipedia. In: Proceedings of the 22nd national conference on artificial intelligence, vol 2. MIT Press, Cambridge, pp 1440–1445Google Scholar
  33. Robaldo L (2010) Interpretation and inference with maximal referential terms. J Comput Syst Sci 76(5):373–388MathSciNetCrossRefzbMATHGoogle Scholar
  34. Robaldo L (2011) Distributivity, collectivity, and cumulativity in terms of (in)dependence and maximality. J Log Lang Inf 20(2):233–271MathSciNetCrossRefzbMATHGoogle Scholar
  35. Robaldo L, Caselli T, Russo I, Grella M (2011) From italian text to timeml document via dependency parsing. In: Proceedings of the 12th international computational linguistics and intelligent text processing conference (CICLing 2011), Tokyo, Japan, 2011, pp 177–187Google Scholar
  36. Robaldo L, Di Caro L, Antonini A (2013) Sentitagger - automatically tagging text in opinionmining-ml. In: ESSEM@AI*IA, volume 1096 of CEUR workshop proceedings., pp 177–180Google Scholar
  37. Robaldo L, Sun X (2017) Reified input/output logic: combining input/output logic and reification to represent norms coming from existing legislation. J Log Comput 27(8):2471–2503MathSciNetCrossRefzbMATHGoogle Scholar
  38. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523CrossRefGoogle Scholar
  39. Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18:613–620CrossRefzbMATHGoogle Scholar
  40. Steinberger R, Mohamed E, Turchi M (2012) Jrc eurovoc indexer jex-a freely available multilabel categorisation tool. In: Proceedings of the 8th international conference on language resources and evaluation (LREC 2012)Google Scholar
  41. Tran OT, Bach NX, Le NM, Shimazu A (2014) Automated reference resolution in legal texts. Artif Intell Law 22(1):29–60CrossRefGoogle Scholar
  42. Tsoumakas G, Katakis I (2007) Multi-label classification: an overview. Int J Data Warehous Min (IJDWM) 3(3):1–13CrossRefGoogle Scholar
  43. Velardi P, Faralli S, Navigli R (2013) Ontolearn reloaded: a graph-based algorithm for taxonomy induction. Comput Linguist 39:665–707CrossRefGoogle Scholar
  44. Yamada I, Torisawa K, Kazama J, Kuroda K, Murata M, De Saeger S, Bond F, Sumida A (2009) Hypernym discovery based on distributional similarity and hierarchical structures. In: Proceedings of the 2009 conference on empirical methods in natural language processing: volume 2. Association for Computational Linguistics, pp 929–937Google Scholar
  45. Yang H, Callan J (2008) Ontology generation for large email collections. In: Proceedings of the 2008 international conference on Digital government research. Digital Government Society of North America, pp 254–261Google Scholar

Copyright information

© Springer Nature B.V. 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of TurinTurinItaly

Personalised recommendations