1 Introduction

This article describes the creation of a lightweight ontology of criminal procedural rights in judicial cooperation. The ontology is intended to help legal practitioners understand the precise contextual meaning of terms as well as helping to inform the creation of a rule ontology of criminal procedural rights in judicial cooperation.

The task of identifying the scope of application of legislative provisions is not straightforward. This is especially true when it comes to definitions of legal concepts, which are usually enshrined in the initial provisions of laws. However, legal acts do not always contain legislation-specific definitions and legal practitioners (especially judges) undertake sophisticated legal reasoning to clarify and justify the scope of application of norms on a case-by-case basis.

To achieve this aim, courts adopt interpretative methods, including systematic interpretation, which aims to determine the meaning of norms “by considering the law in its statutory (Gesetz, lois) or other legal context only, that is, in one or several legal acts of the same legal system” (Padjen 2020, p. 192).

Systematic interpretation of a legal provision is essentially based on norms contained in the legal source under scrutiny or by reference to other sources. For example, the Court of Justice of the European Union (CJEU) often constructs its legal reasoning based on the recitals that form the preamble of European Union (EU) legislation. These norms are not legally binding but they are invoked by the courts when interpreting substantive provisions. As a result, where definitions of certain legal concepts are absent from the articles, their meaning can be identified via other legislative provisions that clarify certain aspects of those concepts.

Furthermore, definitions often include “connecting keywords” that complicate their interpretation, e.g. “for instance”, “for example”, “including”, “excluding” and “such as”. We created an “analogical” ontology to overcome this issue, i.e. to help legal practitioners reason by analogy when definitions include concrete examples of the concepts that they define.

Indeed, this aim can be achieved by ontology engineering, which is a well-established research area in legal informatics that is useful for improving the representation and understandability of laws (See the related work section).

Ontologies can vary in their level of specificity. For instance, the Legal Knowledge Interchange Format (LKIF) Core Legal Ontology (Hoekstra et al. 2007) is jurisdiction neutral, the Lexical Ontologies for legal Information Sharing (LOIS) ontology framework (Tiscornia 2006) has separate entries for EU and national legal terms, and the European Legal Taxonomy Syllabus (ELTS) ontology framework (Ajani et al. 2016), using a bottom-up approach, allows multiple definitions of terms per jurisdiction, with each definition explicitly linked to the source of the definition. The classical definitions used for the ELTS ontology often derive from a specific article dedicated to definitions. They often follow formulaic wording of the type “X means Y”, “X has the meaning of Y” or “X refers to Y”.

However, to the best of our knowledge, there is a lack of ontological analysis in the scientific literature on the representation of so-called “implicit” definitions, which are contained in sparse segments of legal clauses. Our aim is to collect different aspects of concepts within laws to illustrate comprehensive definitions that can support judicial decision-making, especially when judges are engaged in identifying the scope of application of different norms and definitions and have to build on several legal bases to justify their decisions. This work contributes to the state of the art through extension of the definition types outlined in previous work, using this to manually extract implicit definitions and create an annotated dataset. The dataset can then be used for future work including automated classification of text paragraphs and identification of different definition types with natural language processing (NLP).

A general study of the nature of definitions (Di Caro 2020) found that most classic definitions contain hypernyms (usually general rather than direct), meronyms, synonyms and purpose-related information. In the framework of the CrossJustice project,Footnote 1 we faced the unusual problem that in the six relevant directives, only two of them contain an article dedicated to definitions. Article 3 of Directive 2016/800 contains 3 definitions, for the terms “child”, “holder of parental responsibility” and “parental responsibility”. Article 3 of Directive contains only one definition, for the term “legal aid”. There are some classical definitions to be found elsewhere (and we do use them in the ontology). For instance Recital 15 of Directive 2013/48 states that “[t]he term ‘lawyer’ in this Directive refers to any person who, in accordance with national law, is qualified and entitled, including by means of accreditation by an authorized body, to provide legal advice and assistance to suspects or accused persons.” However, there are not many of these, and apart from not being in the expected place, they also have different connecting keywords to those usually used in classical definitions. This provided us an opportunity to explore a phenomenon typically neglected in the construction of domain specific-legal ontologies. Whether classical definitions are present or absent, laws and legal sources in general are typically peppered with a number of hidden definitions (in the sense that they are not clearly marked out as such) as well as incomplete definitions, which may nevertheless help legal practitioners (and legal reasoning systems) to reason on the basis of analogy or teleology. Such definitions can be found not only in articles but also recitals, which play an important role in the legal interpretation of the Court of Justice of the European Union. In Humphreys et al. (2021), different types of such definitions were identified and described as follows:

  • example definition: a concept is explained in terms of typical examples. This class of definition in particular invites reasoning by analogy. There is a sense of completeness, that the instances must belong either to the examples mentioned or something similar.

  • include/exclude definition: include/exclude definitions are often used to emphasise the inclusion or exclusion of certain items where this would otherwise be uncertain or even surprising. Include/exclude definitions are incomplete as there may (or may not) be other items that are included or excluded.

  • definition by reference: some legislation refer explicitly to other legislation for their definitions of certain concepts. These definitions then apply also to the referring legislation by virtue of the explicit reference i.e. the scope of a definition may be expanded to cover another legislation where there is explicit reference to the definition in that other legislation.

In our work of collecting and representing such definitions for the purpose of the CrossJustice ontology, we refined the above classification of definitions and identified further classes which are detailed in Sect. 2 below.

Applying the law necessarily involves applying abstract rules and concepts to specific scenarios. A term-based legal ontology can provide a useful source of reference for finding the meaning of terms and their relations with other terms, which can in turn help improve search functionalities and rule ontologies. Lawyers could benefit from a tool that can sum up the relevant features of legal concepts, not only to provide definitions of legal terms but, eventually, to increase the predictability  of court decisions.

2 Related work

2.1 Domain-oriented legal ontologies

Ontologies are largely used in legal informatics to model and represent legal knowledge for human users and for machine-related purposes. From the 2010s onwards, many ontologies were focused on legal subdomains and were built with features and tools that render them more or less suitable for specific purposes.

Rather than representing the content of normative provisions such as permissions, prohibitions and obligations, our ontology contains definitions of concepts regarding criminal procedural rights in the European Union. Our domain-oriented approach, following the example of the Semantic Web, provides the most effective methods for knowledge management by way of standardised representations such as the Resource Description Framework (RDF) and Web Ontology Language (OWL) which are directly accessible through the Web (Leone et al. 2020).

Open Digital Rights LanguageFootnote 2 (ODRL), created by the ODRL Community Group,Footnote 3 represents policies for digital content and media (Steyskal and Polleres 2014). This language is composed of a core vocabulary suitable for modelling policies and a common vocabulary of general terms to describe the actions they contain in terms of their deontic type – obligations, permissions, prohibitions etc.

As an elaboration of ODRL, the Linked Data RightsFootnote 4 (LDR) ontology was designed by the Ontology Engineering GroupFootnote 5 and extends the ODRL classes of Action, Asset, Policy and Rule to model conditions of use with regard to the Linked Data resources.

As for strictly legal areas, the Creative Commons Rights Expression LanguageFootnote 6 (ccREL) is a standard that models copyright licensing terms in a machine readable format. In a similar domain, the Licence for Linked Open DataFootnote 7 (L4LOD) vocabulary has a light ontological structure for managing terms related to licensing in the Web of Data. Deontic operators for permissions, prohibitions and obligations indicate which actions should be undertaken or avoided with Linked Open Data sources.

Some work on ontologies address the area of tenders and procurement. LOTED2Footnote 8 by Distinto et al. (2016) is intended to represent information about public procurement in the European Union. The ontology reuses terminology contained in Tenders Electronic Daily (TED)Footnote 9, a database hosting all the procurement notices published by the public institutions of European and EEA countries.

In a different way, the Public Procurement OntologyFootnote 10 (PPROC) by Muñoz-Soro et al. (2016) semantically represents information published in official Spanish and EU legal procurement documents. PPROC aims to model the tendering process, starting from the publication of contracts until their termination. To pursue this goal, PPROC provides, among other functionalities, a taxonomy of contracts involved in procurement procedures.

Some ontologies tackle the area of data protection and privacy law. The GDPRtEXTFootnote 11 (GDPR text extensions), by Pandit et al. (2018) is about General Data Protection Regulation No. 2016/679 (GDPR). The Regulation is represented as a linked data resource and each part of the law is assigned a Uniform Resource Identifier (URI), in order to provide a clear overview of the structure of the GDPR — including the identification of articles, recitals and citations — and highlight the relations among their contents.

The contents of the GDPR are also represented in PrOnto (Privacy Ontology) by Palmirani et al. (2018). Beyond information retrieval, PrOnto's aims go beyond information retrieval and is equipped with a theoretical framework of techniques for legal reasoning and compliance checking.

In a different way, PrivOnto, an ontology developed by Oltramari et al. (2018) in the context of the Usable Privacy Policy projectFootnote 12, models annotated privacy policies that explicate the data practices adopted by websites.

Finally, there have been several successful attempts to set up cross-domain ontologies, mainly for the purpose of reconstructing a fragmented legal reality.

EurovocFootnote 13 is a multilingual and multidisciplinary thesaurus managed and updated by the Publications Office of the European Union. Its purpose is to index the legal and political documents issued by European Union institutions to facilitate their retrieval.

Although far from a proper ontology, LegalRuleMLFootnote 14 by Palmirani et al. (2011) and Athan et al. (2015) consists of a standard for representing and sharing legal knowledge. More precisely, the LegalRuleML markup language enables harmonisation of different legal sources, including laws, guidelines and policies. In a similar way, the European Legislation IdentifierFootnote 15 (ELI) standard allows the publication of domestic legal sources with a uniform set of metadata in order to promote mutual access to documents by national administrations.

As an elaboration of LegalRuleML, the Normative Requirements VocabularyFootnote 16 (NRV) by Gandon et al. (2017) is an ontology that makes use of standard frameworks that exist in the Semantic Web in order to model normative requirements and rules.

2.2 Remarks on automated approaches

There is extensive literature on automated approaches for semantic relation extraction and taxonomy induction which could be considered for the task of definition modelling and automatic detection. However, most of the research on natural language processing and machine/deep learning is focused on the detection of lexico-syntactic patterns and neural network architectures (Auger and Barrière 2008; Smirnova and Cudré-Mauroux 2018; Pouran Ben Veyseh et al. 2020), whereas our problem statement concerns more complex (analogical) reasoning strategies that require novel research efforts and future directions.

In addition, current language technologies focusing on content extraction, detection and labelling show intrinsic limitation when it comes to reasoning capabilities and domain-specific knowledge. In this context, manually-built ontologies rather than semi-supervised methodologies are often useful for enriching and improving such techniques. However, the problem of how to integrate the two semantic spaces is still far from being solved and further research in this direction must be undertaken (Zhang et al. 2021).

3 Tools and methodology

The ontology described in this article provides definitions of terms from the following six directives:

  • Directive 2010/64/EU of the European Parliament and of the Council of 20 October 2010 on the right to interpretation and translation in criminal proceedings;

  • Directive 2012/13/EU of the European Parliament and of the Council of 22 May 2012 on the right to information in criminal proceedings;

  • Directive 2013/48/EU of the European Parliament and of the Council of 22 October 2013 on the right of access to a lawyer in criminal proceedings and in European arrest warrant proceedings, and on the right to have a third party informed upon deprivation of liberty and to communicate with third persons and with consular authorities while deprived of liberty;

  • Directive (EU) 2016/343 of the European Parliament and of the Council of 9 March 2016 on the strengthening of certain aspects of the presumption of innocence and of the right to be present at the trial in criminal proceedings;

  • Directive (EU) 2016/800 of the European Parliament and of the Council of 11 May 2016 on procedural safeguards for children who are suspects or accused persons in criminal proceedings;

  • Directive (EU) 2016/1919 of the European Parliament and of the Council of 26 October 2016 on legal aid for suspects and accused persons in criminal proceedings and for requested persons in European arrest warrant proceedings.

Alongside the six EU directives on procedural safeguards for persons subject to criminal proceedings and investigations, the dataset includes other legal sources referenced by the directives and all the judgments of the Court of Justice of the European Union that provide insightful interpretations on the provisions of the directives.

3.1 Theoretical basis

For our work of collecting and representing definitions for the purpose of the lightweight ontology, we started off with the classes described by Humphreys et al. (2021). We then refined those definitions and identified further classes. The result is that we used the following classes to identify definitions:

  • A classical (or regular) definitionFootnote 17 is what we typically envisage when we consider definitions. They often have formulaic phrases to link the definiens with the definiendum, phrases in the form of “X means Y” or “X is understood to mean Y”. Di Caro (2020) found that classical definitions typically contain synonyms, hypernyms, meronyms and/or purpose-related information. For our purposes, what distinguishes classical definitions from the other definition types described here is that they have a sense of completeness. As an example, Article 1(1) of Directive 2010/64 states: “This Directive lays down rules concerning the right to interpretation and translation in criminal proceedings and proceedings for the execution of a European arrest warrant.” From this we obtain the following classical definition for Directive 2010/64: “EU legal act providing rules concerning the right to interpretation and translation in criminal proceedings and proceedings for the execution of a European arrest warrant.” In other words, classical definitions are marked by a higher degree of clarity, richness and readability (Di Caro 2020).

  • A part definition describes the components or elements of a concept where the meaning is best understood in the sum of its parts, such as a procedure or right. For example, in Article 4(2) of Directive 2012/13 below, we can consider each numbered item as an individual piece of the information required in a Letter of Rights:

  • "In addition to the information set out in Article 3, the Letter of Rights referred to in paragraph 1 of this Article shall contain information about the following rights as they apply under national law:

    1. (a)

      the right of access to the materials of the case;

    2. (b)

      the right to have consular authorities and one person informed;

    3. (c)

      the right of access to urgent medical assistance; and

    4. (d)

      the maximum number of hours or days suspects or accused persons may be deprived of liberty before being brought before a judicial authority.”

    Information about the components of the Letter of Rights are to be found in various normative provisions in Directive 2012/13, and linking all these components under the concept of the Letter of Rights can serve as a useful point of reference.

  • An essential part definition consists of components or elements of a concept that are crucial for that concept to exist. For example, in Recital 33 of Directive 2016/800, “[c]onfidentiality of communication between children and their lawyer is key to ensuring the effective exercise of the rights of the defence and is an essential part of the right to a fair trial.” The connecting keywords “is key to” and “is an essential part of” are suggestive of essential part definitions in this instance, but there are others.

  • A purpose definition seeks to explain a concept by its purpose. For example, in Article 7(4) of Directive 2012/13, there are two legitimate reasonss for refusing access to certain materials: “By way of derogation from paragraphs 2 and 3, provided that this does not prejudice the right to a fair trial, access to certain materials may be refused if such access may lead to a serious threat to the life or the fundamental rights of another person or if such refusal is strictly necessary to safeguard an important public interest, such as in cases where access could prejudice an ongoing investigation or seriously harm the national security of the Member State in which the criminal proceedings are instituted”. As such, we put as secondary concepts the following purposes: (1) to avoid prejudicing an ongoing investigation and (2) to avoid seriously harming the national security of the Member State in which the criminal proceedings are instituted.

  • A parameter definition contains one or more parameters that are taken into account in the application of a legal concept which helps to bring clearer understanding of that concept. Article 8(2) of Directive 2016/800 provides a good example of a parameter definition where we have a parameter that applies to multiple legal concepts: “The results of the medical examination shall be taken into account when determining the capacity of the child to be subject to questioning, other investigative or evidence-gathering acts, or any measures taken or envisaged against the child”.

  • A ratione temporis definition is constituted by the timeframe of the application of a legal concept such as a principle, right, obligation or even the whole directive. For example, Article 2(1) of Directive 2016/800 enshrines two ratione temporis definitions: “This Directive applies to children who are suspects or accused persons in criminal proceedings. It applies until the final determination of the question whether the suspect or accused person has committed a criminal offence, including, where applicable, sentencing and the resolution of any appeal”.

  • A ratione persone definition identifies the subjects of a legal concept such as a principle, right, obligation or even the whole directive. For instance, Article 2 of Directive 2016/343 enshrines that “This Directive applies to natural persons who are suspects or accused persons in criminal proceedings. It applies at all stages of the criminal proceedings, from the moment when a person is suspected or accused of having committed a criminal offence, or an alleged criminal offence, until the decision on the final determination of whether that person has committed the criminal offence concerned has become definitive”.

  • A typical example definition ( an example definition subclass) uses a typical example of a wider concept to define the latter. For instance, in Article 2(3) Directive 2010/64, “[t]he right to interpretation under paragraphs 1 and 2 includes appropriate assistance for persons with hearing or speech impediments.”

  • An atypical example definition ( an example definition subclass) is based on a specific example of a wider concept that is not commonly included in conceptions of the latter. For instance, in Article 1(2= of Directive 2010/64, conclusion of the proceedings “is understood to mean the final determination of the question whether they have committed the offence, including, where applicable, sentencing and the resolution of any appeal”. The legislature decided to clarify that the conclusion of the proceedings includes the resolution of any appeal, which is not commonly conceived of as a stage in the proceedings and therefore represents an atypical example. Note that the connecting keyword “include” can be indicative of a typical or atypical example definition, depending on the context.

  • An important example definition (an example definition subclass) like a typical example definition, uses an example of a wider concept to define the latter. However, in this case, while inviting wider analogy, it emphasises that at least the inclusion of this particular case must be respected. For instance in Recital 27 of Directive 2010/64, duty of care towards suspected or accused persons who are in a potentially weak position is emphasised “in particular” towards those who have “any physical impairments which affect their ability to communicate effectively”.

  • parameter example definition is a sub-class of both example and parameter definitions. Just like a parameter definition, it uses examples of parameters to clarify a concept. However, like the various kinds of example definitions described here, the list of parameters is not exhaustive and therefore invites reasoning by analogy. This can be seen in the following example from Recital 4 of Directive 2013/48: “The extent of the mutual recognition is very much dependent on a number of parameters, which include mechanisms for safeguarding the rights of suspects or accused persons and common minimum standards necessary to facilitate the application of the principle of mutual recognition.”

  • A non-example definition (an example definition subclass) uses an example that is not commonly included in conceptions of  a wider concept to provide a negative definition of the latter. For instance, Recital 13 of Directive 2013/48 excludes two specific proceedings from the wider concept of “criminal proceedings” and, in so doing, provides a clearer definition of that concept. The norm states that “proceedings in relation to minor offending which take place within a prison and proceedings in relation to offences committed in a military context which are dealt with by a commanding officer should not be considered to be criminal proceedings for the purposes of this Directive”.

  • A definition by reference represents the fact that not every piece of legislation contains a definition for every concept, and some legislation explicitly refer to other legislation for definitions of certain concepts. For example, Recital 49 of Directive 2016/343 states that “the Union may adopt measures in accordance with the principle of subsidiarity as set out in Article 5 TEU [Treaty on the European Union]. In accordance with the principle of proportionality, as set out in that Article, this Directive does not go beyond what is necessary in order to achieve those objectives.” The definitions of the principle of subsidiarity and the principle of proportionality in the TEU apply explicitly to Directive 2016/343.

As can be seen in Table 2, we have other types of analogical definition classes. In general, analogical definitions introduce concepts by way of examples, which can be considered concrete expressions of abstract concepts.

In example-based definitions this is self-evident because of the use of ad hoc examples. However, we decided to include other classes in the analogical category since we believe, in light of the results of our annotation process, that part, essential part, purpose, parameter, ratione temporis and ratione persone definitions are marked by recurring lexical features that could be helpful to improve analogical reasoning and to achieve exhaustive interpretations of concepts, as in judicial interpretation. More precisely, analogical definitions could represent a relevant feature of automated analogical reasoning when combined with NLP tools (Combs et al. 2022).

The work described in this article is influenced by the European Legal Taxonomy Syllabus (Ajani et al. 2016) in the following ways:

  • it is assumed that the scope of a definition is the legislative source itself, unless its scope has been explicitly restricted or expanded. In our work, restriction of scope is identified by phrases of type “for the purposes of paragraph X”, while expansion of scope is identified by an explicit reference to a definition from another piece of legislation;

  • it is assumed that definitions are specific to the jurisdiction of the legislation concerned. In the context of the EU, it is expected that transposition of legislation (and the concepts defined therein) may result in modified definitions of their concepts such that it is necessary to define relations between related concepts.

3.2 Ontological framework

The implementation of the ontology is based on the Linked Term Bank of Copyright-Related Terms (Rodriguez-Doncel et al. 2015) a.k.a. the Copyright Term Bank. This ontology is also domain-specific, multilingual and multi-jurisdictional, albeit, like ELTS, it addresses a different legal domain. We like the way in which the concepts and terms are organised intuitively, adopting best practice from ontology development. The Copyright Term Bank is in turn built on Lemon and Simple Knowledge Organization Systems (SKOS) classes. We imported the Copyright Term Bank into WebProtégéFootnote 18 so that we could analyse the structure of the ontology and build from that.

For the lightweight ontology, we reused the classes from the Copyright Term Bank shown in Table 1.

Table 1 Classes inherited from the Copyright Term Bank

To this list we have added the classes mentioned in Sect. 3.1. Table 2 lists the new definition types identified in our work.

Table 2 New analogical classes created for the analogical lightweight ontology

Since the new definition types described above necessarily involve relationships between concepts, we have chosen to also model relations between Concepts as per Table 3.

Table 3 “Is” and “has” relationss related to the definition types

This duplication has the following advantages:

  1. 1.

    it enables the original source text to be easily accessed in the definition instances;

  2. 2.

    it enables users to visualise relations among different Concepts (from the point of view of the relevant legal source).

3.3 Data collection and analysis

Six European directives related to criminal procedural rights in judicial cooperation were analysed by a legal expert tasked with finding “classical” and “ analogical” or “non-classical” definitions as described above. The legal expert also searched judgments of the Court of Justice of the European Union for any interpretations that provide additional definitions useful for the ontology, as well as definitions from EU treaties and charters (e.g. TEU, the Treaty on the Functioning of the European Union (TFEU), the Charter of Fundamental Rights of the European Union (CFREU)) and international conventions (e.g. the European Convention on Human Rights (ECHR), the International Covenant on Civil and Political Rights (ICCPR), the Vienna Convention on Consular Relations) which were referred to in the directives themselves. The latter are classed as definition by reference in an Excel table of the definitions we collected.

The legal expert first analysed the directives in the English language, and then compared the relevant normative provisions with their equivalent in the Italian, French and German versions. He found that few normative provisions had significant differences in meaning, but there were a few minor discrepancies.

For instance, the English version of Article 2(1) of Directive 2010/64 states that “Member States shall ensure that suspected or accused persons who do not speak or understand the language of the criminal proceedings concerned are provided, without delay, with interpretation during criminal proceedings before investigative and judicial authorities, including during police questioning, all court hearings and any necessary interim hearings”

The Italian version of the same provision states that “[g]li Stati membri assicurano che gli indagati o gli imputati che non parlano o non comprendono la lingua del procedimento penale in questione siano assistiti senza indugio da un interprete nei procedimenti penali dinanzi alle autorità inquirenti e giudiziarie, inclusi gli interrogatori di polizia, e in tutte le udienze, comprese le necessarie udienze preliminari.

There is a structural difference in the way the concepts are related to one another. In the English version, the concept “criminal proceedings before investigative and judicial authorities” is defined with three examples, namely “police questioning”, “all court hearings” and “any necessary interim hearings.”. However, in the Italian version, the third concept is an example of the second concept. The French and German versions follow the conceptual structure of the English version.

Conversely, linguistic comparison can also help resolve uncertainties and ambiguities. For instance, according to Recital 28 of Directive 2010/64, “[w]hen using videoconferencing for the purpose of remote interpretation, the competent authorities should be able to rely on the tools that are being developed in the context of European e-Justice (e.g. information on courts with videoconferencing equipment or manuals).” It seems strange that “information” should be an example of a “tool”. However, in the light of comparison between linguistic versions, it is clear that this is the intended meaning. For instance, the Italian version of the recital is: “Quando si utilizza la videoconferenza per l’interpretazione a distanza, le autorità competenti dovrebbero poter utilizzare gli strumenti sviluppati nel contesto della giustizia elettronica europea (ad esempio informazioni sui tribunali che dispongono di materiale o di manuali per la videoconferenza).” Linguistic comparison also clarified that Recital 28 provides only one instance of tools, “information on courts with videoconferencing equipment or manuals”, rather than two examples, “information on courts with videoconferencing equipment” and “manuals”—an alternative reading that is only possible in the English version.

The German wording of Article 6(3) of the ECHR explicitly refers to differences between the English and French texts: “Jeder Angeklagte hat mindestens (englischer Text) [emphasis added] insbesondere (französischer Text) [emphasis added] die folgenden Rechte...” The English version of the article states that “[e]veryone charged with a criminal offence has the following minimum rights: [...]”, but the French version does not use the word for “minimum” but rather “notamment” which means “in particular”. In the German provision, “mindestens” refers to “minimum” in the English text, while “insbesondere” is a translation of “notamment” which means “in particular”. By the way, the Italian text also uses the term “in particolare”, which means “in particular”. The insertion of these translations in the German text suggests that there is some semantic difference between “minimum” and “in particular”. We think that these differences are purely lexical rather than semantic, and it could be investigated whether these discrepancies result in divergent domestic approaches to the protection of the right to a fair trial pursuant to Article 6 of the ECHR. However, our multilingual analysis is still ongoing and future work could produce new results.

In the first phase of the work described in this article, the legal expert annotator selected normative provisions that contained typical connecting keywords indicative of certain definition types, such as “include”, “such as”, “inter alia”, “in particular” etc. He consciously selected negative examples too i.e. normative provisions that contain such keywords but do not contain definitions as described above, with a view to help train an ML classifier as future work, so that populating analogical ontologies may not need to be carried out entirely manually in the future. However, the search for keywords was of limited use, with results pertaining mainly to the various kinds of example definitions.

Other types of definitions required more intensive analysis, and the second phase involved reading the whole directives attentively, looking for instances of any type of definition as described above. Even the task of identifying classic definitions required searching throughwhole directives for instances, due to the scarcity of classic definitions worded in the usual way for European directives particular articles dedicated to classic definitions. Our experience suggests that a semi-automated system may have more success with some types of definitions than others.

In the third phase, a second annotator, expert in legal informatics, made a second check in order to evaluate whether there were some definitions missing . This work was carried out after extracting concepts (see below) as our notion of definition types expanded, particularly with regard to purpose and parameter efinitions. This reflected the nature of the work carried out for this article, which was novel, and required refinement as the work progressed.

Table 4 and Fig. 1 show the final results of the definition modelling, extraction and verification phases.

Table 4 Overview of the definitions in the six directives that were extracted and modelled
Fig. 1
figure 1

Distribution of the definition types over the six directives

In our work, we took definitions from both recitals and articles. From the beginning, we noted that there are more recitals than articles in these directives and that they also tend to be longer compared to the articles (Table 5).

Table 5 Number of articles and recitals per directive and total word count of recitals and articles per directive

This is unfortunate, as recitals have a lower legal value than Articles. Nevertheless, we take them as indicating the intentions of the legislator, and therefore helpful for legal interpretation. In any case, in our ontology, we include the precise source of each definition, so that the user can evaluate the weight given to each source as he or she sees fit.

Even during our discussions, there were some normative provisions we were unsure how to represent. We used a simple colour coding scheme to indicate our progress, green for “agreed”, orange for “in discussion” and red for the normative provisions that did not contain definitions.

Due to other commitments, we were unable to continue to work together on all the directives. Therefore, in a subsequent phase, the legal informatics annotator extracted some concepts and her work was then reviewed by the legal expert annotator.

4 Ontology creation

We chose to use WebProtégéFootnote 19 to create the ontology given its established reputation as a tool for creating ontologies and functionalities that allow several users to view and collaborate on projects. In practice, while the tool was reasonably user-friendly, we found it difficult to make any changes once Concepts had been inserted. For instance, we were unable to permanently amend or remove fields that contained errors, as shown in Fig. 2.

Fig. 2
figure 2

Example of an error that cannot be easily removed from WebProtégé

For this reason, we decided to do all our analytical work in Excel and populate the ontology as the final step.

Below, we describe the structure of our ontology, with some examples.

Our first step was to import the Linked Term Bank of Copyright-Related Terms (Rodriguez-Doncel et al. 2015) into WebProtégé so that we could analyse the structure of that ontology and build from that. The Term Bank was available as an N-Triples fileFootnote 20, which we then converted into the Resource Description Framework (RDF) format using an online conversion toolFootnote 21.

The RDF file was then imported into WebProtégé. Unfortunately, none of the SenseDefinition instances could be viewed, and the relations between the Concepts and their Sense Definitions were lost. However, we were not interested in reusing the data, and we were able to understand the general structure (which was our intended goal) mainly from the WebProtégé tool but with recourse to the N-Triples and RDF files where necessary.

Here is a summary of the structure of the Linked Term Bank of Copyright-Related Terms:

  • Owl:Thing has 4 direct subclasses: Concept, LexicalEntry, LexicalSense and SenseDefinition;

  • Concepts have one or more of the following AnnotationProperties:

    • rdfs:label: the most common term for this Concept represented with a plainLiteral string value;

    • skos:definition: a link to an instance of a SenseDefinition, which provides the definition, source and other relevant data;

    • isSenseOf: a link to one or more LexicalEntry instances, which provide the terms used to express the Concept;

    • jurisdiction: a link to a DBpedia entry which provides information about the jurisdiction;

    • reference: a link DBpedia entry;

    • closeMatch: a link to a similar concept in the Interactive Terminology for Europe (IATE) EU terminology database;

    • narrower: a link to an instance of a narrower Concept;

    • rdfs:comment: a plainLiteral value.

The AnnotationProperties rdfs:label, skos:definition and isSenseOf appear in all Concepts.

  • LexicalEntries have the following AnnotationProperties:

    • rdfs:label: a term used to express a Concept in a plainLiteral value;

    • denotes: a link to one or more Concept instances denoted by the term;

    • language: the language of the term, as a plainLiteral value;

    • sense: an owl:NamedIndividual of the LexicalSense class).

The AnnotationProperties rdfs:label, skos:denotes and sense appear in all LexicalEntries.

  • LexicalSenses have the following AnnotationProperty:

    • reference: to one or more instances of the Concept class.

  • SenseDefinitions have the following properties:

    • source: the name and URI of the glossary of terms that is the source of the definition;

    • value: the definition as a plainLiteral value, with that value having a “lang” property.

For our ontology, we kept all the above classes and properties, since we are also interested in representing classical definitions. However, in addition to SenseDefinitions (renamed as ClassicalDefinitions), we have also created other definitions, so that the overall class structure is now:

  • Concept

  • LexicalSense

  • LexicalEntry

  • Definition

    • ClassicalDefinition

      • PartDefinition

      • EssentialPartDefinition

      • PurposeDefinition

      • ParameterDefinition

      • RationeTemporisDefinition

      • RationePersoneDefinition

    • AnalogicalDefinition

      • TypicalExampleDefinition

      • AtypicalExampleDefinition

      • ImportantExampleDefinition

      • ParameterExampleDefinition

      • NonExampleDefinition

Here is an example of a typical example definition from Article 2(3) of Directive 2010/64:

The right to interpretation under paragraphs 1 and 2 includes appropriate assistance for persons with hearing or speech impediments.

In the ontology, the Concept “the right to interpretation” is linked to a TypicalExampleDefinition, which has a field for the definition itself, as well as a comment field to provide the original article for reference. There is another Concept for “appropriate assistance for persons with hearing or speech impediments” (Fig. 3).

Fig. 3
figure 3

The concept of “the right to interpretation” represented as an instance of Concept in the ontology

Fig. 4
figure 4

Example of RelationshipProperties for the representation of an ImportantExampleDefinition

The Copyright TermBank relies entirely on AnnotationProperties to show links between Concepts, their LexicalSenses, LexicalEntries and SenseDefinitions whereas in our ontology we also have definitions that are actually defined in terms of their relations to other Concepts. As such, we use RelationshipProperties to define such relations. This has the benefit of enabling the viewer to visualise the relations between different Concepts, as can be seen in Fig. 4.

Here is an ImportantExampleDefinition from Recital 27 of Directive 2010/64:

The duty of care towards suspected or accused persons who are in a potentially weak position, in particular because of any physical impairments which affect their ability to communicate effectively, underpins a fair administration of justice. The prosecution, law enforcement and judicial authorities should therefore ensure that such persons are able to exercise effectively the rights provided for in this Directive, for example by taking into account any potential vulnerability that affects their ability to follow the proceedings and to make themselves understood, and by taking appropriate steps to ensure those rights are guaranteed.

Fig. 5
figure 5

The concept of “suspected or accused persons who are in a physically weak position” represented as an instance of Concept in the ontology

Figure 5 shows the concept of “suspected or accused persons who are in a physically weak position” represented as an instance of Concept in the ontology.

And, by way of example, Fig. 6 shows the ImportantExampleDefinition for that concept.

Fig. 6
figure 6

An ImportantExampleDefinition of the concept of “suspected or accused persons who are in a physically weak position

Furthermore, Fig. 7 illustrates the relation between that Concept and “suspected or accused persons who have any physical impairments which affect their ability to communicate effectively”:

Fig. 7
figure 7

The relations between the Concepts “suspected or accused persons who are in a physically weak position” and “suspected or accused persons who have any physical impairments which affect their ability to communicate effectively

Fig. 8
figure 8

Graphical representation of definitions in the analogical lightweight ontology

Finally, Fig. 8 shows all the definitions represented in the analogical lightweight ontology. It is possible to distinguish between classical definitions (marked in red) and analogical definitions (marked in green). Classes of definitions are the types of our ontology, whereas the provisions (recitals and articles) represent the instances where the definitions (or parts of them) are contained. We have also marked edges with the corresponding red/green types to show their overall coverage in the data. As can be seen, analogical definitions represent the majority of occurrences, demonstrating the usefulness of modelling them to allow the development of  semi-automated extraction technologies in the future.

As we have explained, the analysed directives do not contain provisions dedicated to providing definitions of the most relevant concepts. For this reason, the classical definitions in the ontology are mostly the result of our efforts to put together segments of definitions in different parts of directives, judgments and other legal sources referred to in the directives under consideration. In some cases, as can be seen in the graph, it was not possible to assemble classical definitions because of the lack of a “sense of completeness", e.g. in the absence of purpose-related information. In these circumstances, we decided to keep each analogical definition separate, since they are less “complete" than classical definitions and provide only partial albeit important information, e.g. example, temporal, or person-based information.

5 Conclusions and future work

The work presented in this article was designed for the the CrossJustice project. It focuses on six directives containing few definitions in the classical sense and many of different (non-classical) types such as example, non-example, purpose and parameter definitions which can aid analogical, teleological and systematic reasoning. For this reason, we decided to build a lightweight ontology to manage this information in a computational and machine-readable way. This represents a novel approach that not only has value for the CrossJustice project but also has scientific value for the legal informatics community.

For this work, we chose to model principles as Concepts, and these are often related by a Purpose link to or from other Concepts. Principles such as “the right to access to a lawyer” are often defined in legislation with higher authority to which the directives refer by reference. As future work, we could model the relationship between legislative sources in a more explicit way. For instance, many of the selected directives implement international treaties or conventions such as the ECHR and the CFREU. Moreover, directives often refer to policy measures such as the Stockholm Programme that was adopted in 2009. It could be interesting, for instance, to assess the ability of the directives to pursue the objectives enshrined in the policies undertaken by the EU, perhaps through the application of network analysis and ontological approaches.

While creating this ontology, we were struck by the significant number of these references to higher legislative sources, in particular the CFREU), the ECHR and the ICCPR. Particularly for this legal domain, the issue of harmonisation is far more complex than the common two-level national and EU jurisdiction perspective of the CrossJustice project. Future ontological research could address, for instance, international and European multi-level protection of fundamental rights. The ICCPR entered into force in 1976 and was ratified by all EU member states. Accordingly, the EU is committed to implementing its principles. The framework is further complicated by the relationship between the CFREU and the ECHR in light of Article 52(3) of the CFREU. That provision confers the status of minimum standard threshold to the interpretation of the rights of the Convention by the European Court of Human Rights in Strasbourg , a threshold that cannot be lowered by EU law. The CFREU has been implemented directly in EU legal acts (which have been transposed and implemented by the EU member states) and via the judgments of the CJEU. For each jurisdiction, there are corresponding courts or authorities with adjudicative functions. For the ICCPR, that authority is the Human Rights Committee, a body of independent experts who monitor implementation of the Covenant). For the ECHR, adopted in the framework of the Council of Europe, that authority is the European Court of Human Rights, and for the CFREU, that authority is the Court of Justice of the European Union. Many scholars try to describe the dialogues that occur between these legislative instruments and judicial or quasi-judicial authorities. The dialogue is based on a variety of legal sources such as judgments, decisions, legislation, and non-binding legal and political instruments (soft law). We believe that these complex dynamics arising from multi-level protection of human rights provides important challenges for legal informatics. The nascent field of legal harmonisation should definitely look at this issue. Of particular relevance to this article is our belief that a domain-specific legal ontology should properly include the definition of concepts from all relevant jurisdictional levels.