A Bottom-Up Approach for Licences Classification and Selection
- 1k Downloads
Licences are a crucial aspect of the information publishing process in the web of (linked) data. Recent work on modeling of policies with semantic web languages (RDF, ODRL) gives the opportunity to formally describe licences and reason upon them. However, choosing the right licence is still challenging. Particularly, understanding the number of features - permissions, prohibitions and obligations - constitute a steep learning process for the data provider, who has to check them individually and compare the licences in order to pick the one that better fits her needs. The objective of the work presented in this paper is to reduce the effort required for licence selection. We argue that an ontology of licences, organized by their relevant features, can help providing support to the user. Developing an ontology with a bottom-up approach based on Formal Concept Analysis, we show how the process of licence selection can be simplified significantly and reduced to answering an average of three/five key questions.
KeywordsRDF Licences and linked data Formal Concept Analysis
Licence specification is an important part of the data publishing process on the web. Recently, a part of the Semantic Web and Linked Data community has been focusing on providing support to the expression of policies on the semantic web. The Open Digital Rights Language (ODRL) provides an ontology for representing policies in the semantic web, and it is used and extended to formally express permissions, prohibitions and duties that licences include1. The RDF Licenses database2 is a first notable attempt at developing a knowledge base of licences described following ODRL. However, identifying suitable licences is still not a trivial task for a data publisher. In the current version, ODRL identifies more than fifty possible actions to be used as permissions, prohibitions or obligations, and there are ontologies that extend ODRL adding even more fine grained policies (e.g. LDR3). Therefore, not only are there many licences that can be applied, but each might include any subset of the many possible features (permitted, prohibited and required actions), that need to be explored in order to obtain a small selection of comparable licences to choose from.
The question that this paper aims to answer is: How can we reduce the effort for licence identification and selection? We advance the hypothesis that an ontology defining relevant classes of licences, formed on the basis of the key features of the instances, should facilitate the selection and identification of a suitable licence. The methodology applied relies on a bottom-up approach to ontology construction based on Formal Concept Analysis (FCA). We developed a tool, Contento, with the purpose of analysing data about licences using FCA, in order to generate a concept lattice. This concept lattice is used as a draft taxonomy of licence classes that, once properly annotated and pruned, can be exported as an OWL ontology and curated with existing ontology editors. We applied this approach to the use case of licence identification, and created a service to support data providers in licence selection by asking a few key questions about their requirements. We show that, with this service, we can reduce the selection of licences from comparing more than fifty possible licence features, to answering on average three to five questions.
The next section surveys related work. Section 3 describes the process of building the ontology, the Contento tool and the modeling choices that have been made. In Sect. 4 we report on the application of the ontology in a service for identification of suitable licences for data providers. Ultimately, we discuss some future work in the concluding Sect. 5.
2 Related Work
Licence recommendation is very common on the web, particularly for software. Services like http://choosealicense.com/ are usually based on common and well known concerns, and recommend a restricted number of trusted solutions. The Creative Commons Choose service4 shares with our approach a workflow based on few questions. However, it is an ad-hoc tool which focuses on selecting a Creative Commons licence. Differently, we are interested in applying a knowledge-based approach, where the way information about licences and requirements is modelled guides the path to the solution.
The Open Digital Rights Language (ODRL) is a rights expression language formalised as an XML Schema5. Recently, an alternative representation based on RDF/OWL has been identified as the backbone for representing policies in the semantic web . The RDF Licenses database  includes the description of licences in RDF6. We used this database as starting point for the present work. However, population and curation of such knowledge base is clearly a necessary step for licence recommendation systems. For example, the descriptions do not specify the types of assets a licence is eligible for (and we don’t cover this aspect in the present paper). The enrichment of the possible terms to express policies will contribute to increase the precision and quality of the descriptions (see LiMO7, L4LOD8 and ODRS9). Applying natural language processing techniques, like the ones proposed in , can facilitate the process of data acquisition.
Licentia  is a tool for supporting users in choosing a licence for the web of data. Similarly to our approach, it is based on the RDF licence database. The user selects possible permissions, obligations and duties extracted from the licence descriptions, in order to specify her requirements. The system applies reasoning over the databases of licences, proposing a list of compatible ones to choose from. With this approach the user needs to perform an action for each of its requirements. Our approach restricts the number of questions through the inferences implied by the classification of licences in a hierarchy (e.g.: any “share alike” licence allows distribution) and only suggests the ones for which a solution actually exists.
The approach proposed in this paper relies on an ontology of licences as a means for licence selection. Such an ontology has been created following a bottom-up approach. Bottom-up approaches for ontology design have been commonly applied in knowledge engineering  and we use here one particular method based on Formal Concept Analysis (FCA) . FCA has been succesfully used in the context of recommender systems [7, 8]. Moreover, FCA has been proposed in the past to support ontology design and other ontology engineering tasks [9, 10]. In the present work we use FCA as a learning technique to boost the early stage of the ontology design.
3 Building the Ontology
Our hypothesis is that an ontology can help on orienting the user in the complex set of existing licences and policies. The RDF Licenses database contains 139 licences expressed in RDF/ODRL. Our idea is therefore to start from the data to create the ontology. The reason for choosing a bottom-up approach to ontology construction is also that the data will include only policies that are relevant.
In order to support the production of the ontology we implemented a bottom-up ontology construction tool called Contento, which relies on FCA. FCA has the capability of classifying collections of objects depending on their features. The input of a FCA algorithm is a formal context - being a binary matrix having the full set of objects as rows and the full set of attributes as columns. Objects and attributes are analysed and clustered in closed concepts by FCA. In FCA, a concept consists of a pair of sets - objects and attributes: the objects being the extent of the concept and the attributes its intent.
Select a set of objects X.
Derive the set of attributes \(X'\).
Derive in the same way the related objects \((X')'\).
\((X'',X')\) is a close concept.
The same process can be performed starting from a set of attributes. A subsumption relation can be enstablished between formal concepts in order to define an order on the set of formal concepts in a formal context. As a result, formal concepts are organized in a hierarchy, starting from a top concept (e.g., Any), including all objects and an empty set of attributes, towards a bottom concept (e.g., None), with an empty set of objects. Moreover, this ordered set forms a mathematical structure: the concept lattice.
The objective of the Contento tool is to support the user in the generation and curation of concept lattices from formal contexts (binary matrixes) and to use them as drafts of semantic web ontologies.
by object name (or all that have a given attribute)
by attribute name (or all that have a given object)
by status (holds, does not hold, to be decided).
Contento implements the Chein algorithm  to compute concept lattices. The result of the algorithm is stored as a taxonomy. A taxonomy can be navigated as an ordered list of concepts, from the top to the bottom, each of them including the extent, the intent and links to upper and lower concept bounds in the hierarchy (see Fig. 2). In addition, the tool shows which objects and attributes are proper to the concept, i.e. do not exist in any of the upper (for attributes) or lower (for objects) concepts.
Taxonomies can be translated into OWL ontologies. The user can decide how to represent the taxonomy in RDF, what terms to use to link concepts, objects and attributes, and whether items need to be represented as URIs or literals. Ultimately, these export configurations can be shared and reused. For example, Contento offers a default profile, using example terms, or a SKOS profile.
3.2 The Ontology
In the above excerpt, the “CC-BY” licence permits to copy, the “All rights reserved” policy prohibits it, and the “Mozilla 2.0” licence does not include a share-alike requirement.
If the concept is meaningful, name it and annotate it with a relevant question (e.g. “should others be allowed to distribute the work?”) in the comment field;
If the concept is not meaningful or useful, it can be deleted (with the lattice being automatically adjusted).
The resulting annotated taxonomy has been exported as OWL ontology as the initial draft of the the Licence Picker Ontology. The draft included a sound hierarchy of concepts. Both concepts (classes) and licences were annotated with the respective set of policies. Because the policies were expressed as plain literal on a generic has property (the data being manipulated as object/attribute pairs by the FCA based tool), a small refactoring permitted to reintroduce the RDF based descriptions with ODRL. The Licence Picker ontology13 currently contains 21 classes linked to 45 licences with a is-A relation. Each class is associated with a relevant question to be asked that makes explicit the key feature of the included set of licences. The ontology embeds annotations on the classes about the policies included in all the licences of a given concept, and a ODRL based description of permissions, prohibitions and duties of each instance.
4 Pick the Licence
The Licence Picker Webapp welcomes the user with forty-five possible licences and a first set of questions, as show in Fig. 5. One of them catches the eye of the user: Should the licence prohibit derivative works? She promptly answers Yes. The set of possible licences is reduced to five, and the system propose a single question: Should the licence prohibit any kind of use (All rights reserved)? This time the user answers No, because they want the users to use the information to boost the activities in the data store. As a result, the system proposes to pick one of four licences. The user notices that all of them require an attribution statement and prohibit to produce derivative works. Two of them also prohibit the use for commercial purposes, so the user decides to choose the Creative Commons CC-BY-ND 4.0 licence.
The example above shows an important property of the approach presented in the paper. As the licences are classified by the mean of their features, and the classes organized in a hierarchy, we can notably reduce the number of actions to be taken to obtain a short list of comparable licences. The user had four requirements to fullfill, more then fifty existed in the database, and she could get an easy comparable number of licences with only two steps.
5 Conclusions and Future Work
Licences are an important part of the data publishing process, and choosing the right licence may be challenging. By applying the Licence Picker Ontology (LiPiO), this task is reduced to answering an average of three to five questions (five being the height of the class taxonomy in LiPiO) and assessing the best licence from a small set of choices. We showed how our approach reduces significantly the effort of selecting licences in contrast with approaches based on feature exploration. In addition, a bottom-up approach on ontology building in this scenario opens new interesting challenges. The RDF description of licences is an ongoing work, modeling issues are not entirely solved and we expect the data to evolve in time, including eventually new licences and new types of policies. For example, in our use case the data has been curated in advance in order to obtain an harmonized knowledge base, ready to be bridged to the Contento tool. This clearly impacts the ontology contruction process and the application relying on it, as different data will lead to different classes and questions. This gives the opportunity to explore methods to automate some of the curation tasks (especially pruning) and to integretate changes in the formal context incrementally, to support the ontology designer in the adaptation of the ontology to the changes performed in the source knowledge base. Such evolutions do not impact the Licence Picker Webapp, because changes in the ontology will be automatically reflected in the tool. We foresee that the description of licences will be extended including other relevant properties - like the type of assets a licence can be applied to. The advantage of the proposed methodology is that it can be applied to any kind of licence feature, not only policies.
The Contento tool was designed to support the task at the center of the present work. However, the software itself is domain independent, and we plan to apply the same approach to other domains. Ultimately, we want to compare Contento to other similar tools, for example ToscanaJ , and perform a user based evaluation.
- 1.Steyskal, S., Polleres, A.: Defining expressive access policies for linked data using the ODRL ontology 2.0. In: Sack, H., et al. (eds.) Proceedings of the 10th International Conference on Semantic Systems (SEMANTiCS 2014). ACM, New York (2014)Google Scholar
- 2.Rodríguez-Doncel, V., Villata, S., Gómez-Pérez, A.: A dataset of RDF licenses. In: Hoekstra, H. (ed.) Legal Knowledge and Information Systems. JURIX 2014: The Twenty-Seventh Annual Conference. IOS Press, Amsterdam (2014)Google Scholar
- 4.Cardellino, C., Villata, S., Gandon, F., et al.: Licentia: a tool for supporting users in data licensing on the web of data. In: Horridge, M., Rospocher, M., van Ossenbruggen, J. (eds.) Proceedings of the ISWC 2014 Posters Demonstrations Track, a Track within the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, 21 October 2014Google Scholar
- 8.Li, X., Murata, T.: A knowledge-based recommendation model utilizing formal concept analysis and association. In: 2010 the 2nd International Conference on Computer and Automation Engineering (ICCAE), vol. 4, pp. 221–226. IEEE (2010)Google Scholar
- 10.Obitko, M., Snasel, V., Smid, J., Snasel, V.: Ontology design with formal concept analysis. In: CLA, vol. 110 (2004)Google Scholar
- 12.d’Aquin, M., Adamou, A., Daga, E., et al.: Dealing with diversity in a smart-city datahub. In: Omitola, T., Breslin, J., Barnaghi, P. (eds.) Proceedings of the Fifth Workshop on Semantics for Smarter Cities, a Workshop at the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, 19 October 2014. CEUR-WS.orgGoogle Scholar
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.