The automated identification of national implementations (NIMs) of European directives by text similarity techniques has shown promising preliminary results. Previous works have proposed and utilized unsupervised lexical and semantic similarity techniques based on vector space models, latent semantic analysis and topic models. However, these techniques were evaluated on a small multilingual corpus of directives and NIMs. In this paper, we utilize word and paragraph embedding models learned by shallow neural networks from a multilingual legal corpus of European directives and national legislation (from Ireland, Luxembourg and Italy) to develop unsupervised semantic similarity systems to identify transpositions. We evaluate these models and compare their results with the previous unsupervised methods on a multilingual test corpus of 43 Directives and their corresponding NIMs. We also develop supervised machine learning models to identify transpositions and compare their performance with different feature sets.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price excludes VAT (USA)
Tax calculation will be finalised during checkout.
The output vector is computed by multiplying the embedding vector by the hidden layer.
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: OSDI, vol 16, pp 265–283
Ajani G, Boella G, Di Caro L, Robaldo L, Humphreys L, Praduroux S, Rossi P, Violato A (2017) The European legal taxonomy syllabus: a multi-lingual, multi-level ontology framework to untangle the web of European legal terminology. Appl Ontol 2(4):325–375
Aletras N, Tsarapatsanis D, Preoţiuc-Pietro D, Lampos V (2016) Predicting judicial decisions of the European court of human rights: a natural language processing perspective. PeerJ Comput Sci 2:e93
Bergamaschi S, Po L (2014) Comparing lda and lsa topic models for content-based movie recommendation systems. In: International conference on web information systems and technologies. Springer, pp 247–263
Bird S, Loper E (2004) Nltk: the natural language toolkit. In: Proceedings of the ACL 2004 on interactive poster and demonstration sessions. Association for Computational Linguistics, p 31
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
Boella G, Di Caro L, Humphreys L, Robaldo L, van der Torre L (2012) Nlp challenges for eunomos, a tool to build and manage legal knowledge. In: Language resources and evaluation (LREC). pp 3672–3678
Boella G, Di Caro L, Robaldo L (2013) Semantic relation extraction from legislative text using generalized syntactic dependencies and support vector machines. Springer, Berlin, pp 218–225
Boella G, Di Caro L, Humphreys L, Robaldo L, Rossi R, van der Torre L (2016) Eunomos, a legal document and knowledge management system for the web to provide relevant, reliable and up-to-date information on the law. Artif Intell Law 24:245–283
Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606
Cardellino C, Teruel M, Alemany LA, Villata S (2017) A low-cost, high-coverage legal named entity recognizer, classifier and linker. In: Proceedings of the 16th edition of the international conference on artificial intelligence and law. ACM, pp 9–18
Ciavarini Azzi G (2000) The slow march of european legislation: the implementation of directives. In: European integration after Amsterdam: institutional dynamics and prospects for democracy
Cosma G, Joy M (2012) An approach to source-code plagiarism detection and investigation using latent semantic analysis. IEEE Trans Comput 61(3):379–394
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391
Eliantonio M, Ballesteros M, Rostane M, Petrovic D (2013) Tools for ensuring implementation and application of eu law and evaluation of their effectiveness. Technical reports on European Parliament
Golub GH, Reinsch C (1970) Singular value decomposition and least squares solutions. Numer Math 14(5):403–420
Hartung J, Knapp G, Sinha B (2011) Statistical meta-analysis with applications, vol 738. Wiley, Hoboken
Hong L, Davison BD (2010) Empirical study of topic modeling in twitter. In: Proceedings of the first workshop on social media analytics. ACM, pp 80–88
Humphreys L, Santos C, Di Caro L, Boella G, Van Der Torre L, Robaldo L (2015) Mapping recitals to normative provisions in eu legislation to assist legal interpretation. In: JURIX. pp 41–49
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: European conference on machine learning. Springer, pp 137–142
Kenter T, De Rijke M (2015) Short text similarity with word embeddings. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, pp 1411–1420
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning. pp 1188–1196
Maaten LVD, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605
Magerman T, Van Looy B, Song X (2010) Exploring the feasibility and accuracy of latent semantic analysis based text mining techniques to detect similarity between patent documents and scientific publications. Scientometrics 82(2):289–306
Mandal A, Chaki R, Saha S, Ghosh K, Pal A, Ghosh S (2017) Measuring similarity among legal court case documents. In: Proceedings of the 10th annual ACM India compute conference, Compute ’17. ACM, New York, pp 1–9
McHugh ML (2012) Interrater reliability: the kappa statistic. Biochem Med 22(3):276–282
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Nanda R, Di Caro L, Boella G (2016) A text similarity approach for automated transposition detection of European union directives. In: 29th International conference on legal knowledge and information systems, JURIX 2016, vol 294. IOS Press, pp 143–148
Nanda R, Di Caro L, Boella G, Konstantinov H, Tyankov T, Traykov D, Hristov H, Costamagna F, Humphreys L, Robaldo L, et al (2017) A unifying similarity measure for automated identification of national implementations of European union directives. In: Proceedings of the 16th edition of the international conference on articial intelligence and law. ACM, pp 149–158
Nanda R, Siragusa G, Caro LD, Theobald M, Boella G, Robaldo L, Costamagna F (2017) Concept recognition in European and national law. In: Legal knowledge and information systems—JURIX 2017: the thirtieth annual conference, Luxembourg, 13–15 December 2017, pp 193–198
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Řehůřek R, Sojka P (2010) Software framework for topic modelling with Large Corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks, ELRA, Valletta, Malta, pp 45–50. http://is.muni.cz/publication/884893/en
Robaldo L (2010) Interpretation and inference with maximal referential terms. J Comput Syst Sci 76(5):373–388
Robaldo L (2011) Distributivity, collectivity, and cumulativity in terms of (in)dependence and maximality. J Log Lang Inf 20(2):233–271
Robaldo L, Sun X (2017) Reified input/output logic: combining input/output logic and reification to represent norms coming from existing legislation. J Log Comput 27:2471–2503
Robaldo L, Caselli T, Russo I, Grella M (2011) From Italian text to timeml document via dependency parsing. In: Computational linguistics and intelligent text processing—12th international conference, CICLing 2011, Tokyo, Japan, 2011, pp 177–187
Sparck Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28(1):11–21
Research presented in this paper is conducted as a Ph.D. research at the University of Turin, within the Erasmus Mundus Joint International Doctoral (Ph.D.) programme in Law, Science and Technology. This work has been partially supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Skodowska-Curie Grant agreement no. 690974 for the project “MIREL: MIning and REasoning with Legal texts”.
About this article
Cite this article
Nanda, R., Siragusa, G., Di Caro, L. et al. Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives. Artif Intell Law 27, 199–225 (2019). https://doi.org/10.1007/s10506-018-9236-y