ECOLE: An Ontology-Based Open Online Course Platform

  • Vladimir Vasiliev
  • Fedor Kozlov
  • Dmitry Mouromtsev
  • Sergey Stafeev
  • Olga Parkhimovich
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9500)


The chapter presents use cases and architecture of an educational platform built upon educational Linked Open Data. We review ontologies designed for educational data representation and discuss in detail the Enhanced Course Ontology for Linked Education (ECOLE) as an example. Its semantic core opens up a variety of new possibilities for interaction between an e-learning system and related Web-services and applications. The other feature of the ontology-based platform is a flexible structuring and linking of open educational resources. The last part of the chapter discusses these new possibilities and analyzes trends in linked learning.


Semantic web Linked learning Terminology extraction Education Educational ontology population 

1 Introduction

Semantic technologies enable a completely new approach to learn on the Internet by means of semantic agents. And there is a number of initiatives regarding creating and publishing open educational data including five stars datasets. In many cases search engines and knowledge graphs already provide sufficient support for basic online education. For more complicated educational scenarios there are resources like BBC Bitesize1 where the learner can find educational content organized by means of ontologies. In our work we address the challenges of understanding how semantics can help to manage educational data and to make teaching with electronic materials more personalized with respect to the skills and knowledge background of a particular user.

To answer these questions we developed an experimental ontology-based open online course platform ECOLE. Initially ECOLE has been created with an intent to provide a framework for developing e-learning systems in the Semantic Web era. But step-by-step we realized the role of ECOLE gradually evolved to become an Enhanced Course Ontology for Linked Education, or in other words a semantic layer for educational resources linking and integration. And as a result ECOLE exists as a semantic core of the e-learning system based entirely on OWL ontologies and semantic technologies. It should not be compared with the most advanced e-learning platforms such as edX, Moodle and others because the purpose of ECOLE is to make electronic education more personal and flexible, to make it possible to reuse existing educational resources and to provide more intelligent interactive teaching and analytical functionality for end users. While the most popular e-learning platforms are learning management systems for electronic and distance education our development is focused on the educational knowledge representation and e-learning analytics.

This chapter is organised as follows: Sect. 1 presents a brief survey of related work and defines the problem of interest. Section 2 describes the ontology development for all educational activities. Section 3 explains some technical details of populating the ontology including natural language processing over educational materials. Section 4 describe the ECOLE system architecture and application illustrated with examples of the student UI and the analytical back-end. Finally, Sect. 5 presents the evaluation results.

1.1 Related Work

There are two kinds of applications based on semantic technologies for educational purposes. The first type of projects are based on the principles of Linked Data and aimed at publishing of research and educational data in RDF format for the purposes of search and exchange of information. Probably the biggest ones are the Linked Universities2 initiative, an alliance of European universities committed to exposing their public data as Linked Data, VIVO3 project in US that provides a platform for integrating and sharing information about researchers and their activities and the Open University4, a distant learning and research university with over 240,000 students. The second type of projects is trying to use semantic data models for managing information inside learning platform for making them more flexible, integrated and interactive. One example of semantics usage in the field of education is mEducator. It is a content sharing approach to medical education based on Linked Data principles. Through standardization, it enables sharing and discovery of medical information [1]. Another example is already mentioned project Bitesize from BBC. One more good example of using semantics to make educational materials reusable and flexible is the SlideWiki [2] system, a collaborative OpenCourseWare platform dealing with presentations and performance assessment. It uses CrowdLearn concept as a comprehensive data model that allows a collaborative authoring of highly structured educational materials and improve its quality by means of crowd-sourcing or co-creation. An original approach to integration of semantic technologies into an educational environment is presented in the work of F. Zablith [3]. The author describes his results on creation of a semantic Linked Data layer for conceptual connection of courses taught in a higher education program. He also presents applications which show how learning materials can float around courses through their interlinked concepts in eLearning environments. The last three examples are presented later in this book in details.

Having in mind the main challenge of our work and results presented below in this chapter it is important to mention a number of ontologies that already exist in the area of e-learning. In our overview we included several examples that could be classified by its purpose into three groups:
  • modeling a structure of a course,

  • referencing to some educational resources, and

  • linking of particular parts of learning processes.

Probably the most popular ontology for representation of courses and modules is The Academic Institution Internal Structure Ontology (AIISO)5. AIISO provides classes and properties to describe the internal organizational structure of an academic institution.

For representation of references in semantic formats there are The Bibliographic Ontology(BIBO)6 and The Ontology for Media Resources(MA-ONT)7. BIBO is used to describe bibliographic resources associated with the course such as books or papers. BIBO provides main concepts and properties for describing citations and bibliographic references. MA-ONT describes a core vocabulary of properties and a set of mappings between different metadata formats of media resources published on the Web. MA-ONT is used to store video lectures and additional video materials.

Finally a good example of linking ontology is the TEACH8 (Teaching Core Vocabulary). TEACH is one of the most relevant and recent ontologies published in the field of education. It is a lightweight vocabulary providing terms to enable teachers relate things in their courses together. TEACH is based on practical requirements set by providing seminar and course descriptions as Linked Data.

The more complete overview of ontologies and vocabularies for education is presented here

We tried to reuse ontologies listed above where it was possible. And in the Sect. 2 detailed description of the use of existing ontologies is given. At the same time a personalization of electronic teaching with respect to the skills and knowledge background of a user is still being an open question. The contribution of ECOLE here is in modeling of domain knowledge, user activity and his/her knowledge assessment on top of courses structure and external educational resources. This include the following key aspects:
  • Boost integrity of a course parts by means of shared domain models and inferred indirect links (see Sect. 2.2).

  • Automate an evaluation of knowledge assessment materials with a semantic analyses of lecture coverage by tests (see Sects. 2.3 and 4.3).

  • Log a user activity in an e-learning system adjusted to domain knowledge models (see Sect. 2.4) that allows,

  • Calculate a user knowledge rate by domain (see Sects. 2.5 and 4.3) that gives links to precise educational materials related to a particular domain concept, or term, that user should repeat in case his/her knowledge rate is low.

Obviously all these aspects are beyond the functionality of traditional e-learning systems and semantic technologies here have a great potential. In the next sections of this chapter we explain our ontology-based approach to solve the problem of personalization of open online education as it was set up in this introductory section.

1.2 Motivation

The major task in developing and maintaining an ECOLE system is choosing and interlinking relevant materials, e.g., creating associations between subject terms (or just terms in the context of his chapters) in lectures, practice and tests. This requires domain ontologies and their population with facts from educational content. When data from different external resources is integrated into the course, it can impair the quality of the content. Therefore one of the goals of ECOLE is to provide tools for tutors to check quality of the course using the relations between elements of the course. Another way to improve the quality of the course is to use students activity in the system. In the ECOLE system any kind of sophisticated statistics can be gathered, e.g. statistics about students’ correct/incorrect answers allowing to filter out troublesome terms and topics. Teachers can use this statistics to improve the quality of their courses. Students can use personal statistics to fill their knowledge gaps.

2 Ontology Development

All data in the ECOLE system is stored in RDF using a set of developed ontologies. The data model of the ECOLE system contains three basic data layers: the Domain Data Layer, the Educational Data Layer, and the Activity Data Layer. The layers are linked with each other to support interoperability between variety of resources of the system. The data model is shown in Fig. 1.
Fig. 1.

The data model of ECOLE system

The Domain Data Layer contains information about subject fields of education and science. This layer is the core of the ECOLE data model. Its data changes rarely and is gathered from external knowledge bases, taxonomies, and datasets such as DBpedia and Mathematics Subject Classification.

The Educational Data Layer contains educational content for teaching. The layer stores the educational programs, courses, tests, and media resources. This data is expected to change frequently and can be gathered from repositories of universities, open libraries, and media providers. The entities of the Educational Data Layer can be linked to the entities of the Domain Data Layer manually or automatically using NLP algorithms or automated reasoning.

The Activity Data Layer contains statistical data about system users. The layer stores information about students, their activity in the Learning Management System, and their learning results. The content of this layer changes all the time. It is gathered from the users of Learning Management Systems and various social networks. The entities of Activity Data Layer can be linked to the entities of the Educational Data Layer automatically using the algorithms provided by the Learning Management System.

2.1 The Ontology of Educational Resources

The ontology of educational resources describes relations between courses, modules, lectures, and terms and helps to represent its properties and media content. The original ontology is built on top of uper level ontologies that are commonly used in descriptions of educational resources [5]. These ontologies are shortly described in the Sect. 1.1.

The ontology of educational resources9 has the following common classes: Course, Module, Lecture, Test, Exam, Practice, Filed, Term, Resource. The ontology contains 32 classes, 42 object properties, and 13 datatype properties. The classes of the developed ontology are shown in Fig. 2.
Fig. 2.

Main classes of the ontology of educational resources

The most outstanding feature of this ontology is its ability to create direct and indirect interdisciplinary relations between courses [4]. E.g., physics test “Interference and Coherence” includes math terms as well (“vector”, “vector product”). Thus, if a student cannot pass this test, the system advises to repeat not only the lecture “Occurrence of Interference” in the Physics course, but also the corresponding lectures from the Vector Algebra course. This is an example of indirect links between physics and vector algebra via the subject terms “vector” and “vector product”. An example is shown in Fig. 3.
Fig. 3.

An example of interdisciplinary relations between courses

2.2 Ontology Mapping

We use ontology mapping techniques to support interoperability amongst educational systems based on the developed ontology. Ontology mappings define correspondences between concepts in different ontologies. In this chapter ontology mappings are used to map a concept found in the ontology of educational resources into a view, or a query, over other ontologies. We have chosen the TEACH (Teaching Core Vocabulary) ontology [6] as the target for the ontology mapping purposes and based on its specification we perform ontology mapping manually. Equivalent classes are linked using the owl:equivalentClass axiom of the OWL Web Ontology Language [7, 8]. Equivalent properties in the two ontologies are linked using the owl:equivalentProperty axiom.

The results of ontology mapping between the ontology of educational resources and TEACH ontology are shown in Table 1.
Table 1.

The results of ontology mapping between ontology of educational resources and TEACH ontology

Ontology of educational resources

Teaching Core Vocabulary













2.3 The Test Ontology

To describe the content of tests an upper ontology for test structure representation has been developed. Top-down approach was used to develop ontologies for the educational system because a new ontology extends existing upper ontology. The ontology10 has the following classes: Test, Testing Knowledge Item, Group of Tasks, Task, Answer, Question, Fill-in the Blank, Matching, Multiple Choice, Single Answer, Text Answer, True/False. The ontology contains 12 classes, 10 object properties and 6 datatype properties. The classes of the developed ontology are shown in Fig. 4. The main purpose of the developed ontology is to represent structural units of a test and provide automatic task matching by defining semantic relations between tasks and terms [9].
Fig. 4.

Main classes of the Test Ontology

The ontology has the class “Test” to store common test characteristics, e.g. its title and description, and class “Testing Knowledge Item” to describe test elements. The class “Testing Knowledge Item” has subclass “Task”. The class “Group Of Tasks” [10] was added to group questions by parameters, e.g. by “difficulty”. The class “Task” has the subclass “Answer”. The class “Question” has subclasses describing question types: “Fill-in the Blank”, “Matching”, “Multiple Choice”, “Single Answer”, “Text Answer”, and “True/False”. The class “Answer” has object properties “is wrong answer of” and “is right answer of”. Using these two object properties instead of one data property “has answer” enables one to use one set of answers for many questions.

2.4 The Ontology of Student Activity in the E-Learning System

The ontology of student activity11 is designed to store information about the student’s learning process and results. Two upper ontologies have been used for its development: the Test Ontology, as described above, and the FOAF ontology12 that describes people and relationships between them.
Fig. 5.

Main classes of the student activity in the e-learning system ontology

The ontology contains 10 classes, 15 object properties and 5 datatype properties. The classes of the developed ontology are shown in Fig. 5. The class “Learning process” was added to store information about actions performed by a student in the system. Students can watch video (subclass “Video”), try to pass the test (subclass “AttemptToPassTest”), learn terms (subclass “Term”) and pass a course (subclass “Course”). The ontology also has class “Student” to store information about users and their activity in system. This class is a subclass of class “Person” defined in FOAF ontology. The object properties “enrolled course”, “finished course”, and “subscribed course” describe relationships between the class “Student” and the class “Course”. The class “Learning results” was added to store information about students educational activities and answers. Class “TestElement” contains information about “Task” (class of test ontology) and about student’s “Answer” (subclass of class “LearningResults”), which can be correct or incorrect. Set of test elements constitutes attempt to pass test. The properties “timestamp of attempt” and “percent complete of test” allow e-learning system to store information about the time in which an attempt was made and to determine the result of the test. The e-learning system uses the ontology of tests and answers given by the user to build a list of terms that the user knows.

2.5 Ontology of Knowledge Rates

The knowledge rates ontology module is intended for keeping information about rates of term and domain knowledge rates for each student. The classes of the developed ontology are shown in the Fig. 6. Term’s rate shows whether the student assimilated it. For example, if a student has watched or read the lecture with this term and has passed a test with this term successfully, we can consider, that student knows it. Ontology module contains class “Rate” and 5 subclasses: “Lecture Term Rate” computed as the number of lectures, containing this term and viewed by the student; “Test Attempt Term Rate” keeping attempts to pass a test with this term and number of correct answers to the task with a term; “Average Test Term Rate” based on average result of all attempts to pass one test or to pass all tests with this term; “Total Term Rate” based on sum of rates of this term; “Domain Rate” based on all rates of all terms from the domain student is learning. Each class contains data property “value” to store numeric values of rates. Also ontology contains object properties which link rates to the class “Student” from the educational ontology, “Test” from the test ontology and “Term” from the terms ontology. Ontology allows adding additional “Rate” subclasses storing new metrics as well as changing or adding formulas to calculate.
Fig. 6.

Ontology of knowledge rates

With the described modules we retain all the data associated with the training of students. Let us begin with a general example. John Smith, our imaginary student, has started the “Optics” course. This course contains lectures and practical tasks (from ontology module of education), several tests with different groups of tasks and information about wrong and right answers (from ontology module of tests). Each task from tests and each lecture have terms which are described in ontology module of terms. When the student tries to pass the test a new individual of class “AttemptToPassTest” is created. When the student has solved the task his results are recorded in ontology module of knowledge rates. Based on wrong and right answers, metrics for terms that are checked in this tasks are compiled and changed, based on the formula, described below.

3 Methods of Ontology Population

3.1 Providers

The ECOLE system uses providers to collect educational linked open data. The provider supports automatic updating of the linked data from external resources. The provider can convert received structured data to RDF data model and store it into a triplestore. Each provider has separate context to manage the gathered data.

External Open Resources. The ECOLE system uses providers to get bibliographic data from open electronic libraries. The British National Bibliography (BNB)13 provides open access to bibliographic content stored in RDF format. The bibliographic content is described using the BIBO ontology. The ECOLE system collects data about books and publications from BNB and allows tutors to link their courses with BNB content using “hasResource” property.

Linking Terms to DBpedia. The ECOLE system uses providers to get descriptions for terms of subject fields. Subject term in the system can be linked with external resources of knowledge bases, such as DBpedia, Freebase, and WikiData. The DBpedia provides a SPARQL-endpoint to content that was extracted from Wikipedia. The provider automatically collects description of the terms through queries to SPARQL-endpoint. This approach allows to expand subject term model using information from external resources.

3.2 Converting Structured Data to RDF

Many resources in the Web still stored in the structured format, but not in the RDF triples. For example, educational tests of university can be stored in XML format or electronic library can share content via a REST API. The ECOLE system uses conversion methods in the provider to integrate the structured data into the educational content.

REST API. The ITMO University library shares information about books and papers via REST API. The ECOLE system uses groovy script to convert received data to RDF format. During conversion the system uses BIBO ontology to store data about books and papers in RDF format.

XML to RDF. To convert test data from XML format to RDF format a mapping was described. To provide conversion in the system an XMLProvider instance was created. The mapping for the test data conversion was described in the XML format. The mapping allows to convert XML files of the tests to the semantic data in accordance with the test ontology automatically. The XMLProvider uses XPath functions to extract data about objects and properties from the input XML file. The extracted data is converted into the RDF/XML format based on the mapping description.

An example of XML input, the mapping, and the output result for the test entity conversion is in Table 2.
Table 2.

Example of test entity conversion.

3.3 NLP Algorithm

The developed NLP algorithm based on morpho-syntactic patterns is applied for terminology extraction from course tasks that allows to interlink the extracted facts (subject terms) to the instances of the systems ontology whenever some educational materials are changed. These links are used to gather statistics, evaluate quality of lectures’ and tasks materials, analyse students answers to the tasks and detect difficult terminology of the course in general (for the teachers) and its understandability in particular (for every student).

Considering the small sample size and pre-set list of lecture terms POS-tag patterns combined with syntax patterns seem to be the most appropriate method for extracting terms from the tests [11, 12, 13]. The same algorithm was used for tests in Russian and for the tests translated into English for the demo version. About ten most typical compound term patterns were used to extract candidate terms (nominal compounds and adjectival compounds).

Russian compound candidate terms are transformed to the canonical form (that coincides with a headword in dictionaries) after extraction. For example, the pattern <adjective + noun>extracts an actual term <feminine adjective in instrumental case + feminine noun in instrumental case>, but lemmatization removes agreement and will produce two lemmas: <masculine adjective in nominative case>and <feminine noun in nominative case>whereas the appropriate form of the term is <feminine adjective in nominative case + feminine noun in nominative case>. This does not influence the procedure of linking candidate terms to the knowledge base instances but it is significant for the procedure of validation of missing terms.

NooJ linguistic engine [14] was used to extract terms. NooJ has a powerful regular expression corpus search allowing to join various POS-patterns in a single grammar to query the text. Dictionaries of lexical entries (for tests and ontology terms) and inflectional grammars were written for the Russian language by the authors of the chapter. Lexical resources developed for the Russian language cover tasks’ vocabulary totally. To analyze English text for the demo version standard NooJ resources were augmented and reused. NooJ dictionaries allow to combine various linguistic information for the lexical entry.

Several derivational paradigms for the Russian morphology were described with NooJ transducers and ascribed to the lexical entries [15]. Assigning derivational paradigms allows to produce a common lemma for the lexical entry and its derivatives, e.g. “coplanar” and “coplanarity” will have common lemma “coplanar”. It should be noticed that NooJ descriptions allow to choose any word of the pair as a derivational basis and e.g. derive “coplanar” from “coplanarity” with a common lemma “coplanarity”.

NooJ also has a very useful concept of a super-lemma. It allows for linking all lexical variants via a canonical form and store them in one equivalence class [16], e.g. in our dictionary a lexical entry “rectangular Cartesian coordinate system” is attached to its acronym “RCCS” (the last is a considered a canonical form) and a query either on acronym or on a compound term matches all the variants.

The overall algorithm of term extraction inside the NLP module consists of the following steps:
  • Loading input in plain text into NooJ which performs its linguistic analysis using provided dictionaries. The output is also in plain text but with annotations containing morphological and semantic information for every analyzed word (Text Annotation Structure).

  • Applying queries (that is POS-tag patterns combined with syntactic patterns) stored in a single NooJ grammar file to the Text Annotation Structure. The output is a list of candidate terms.

  • The candidate terms with annotations are exported to a text file.

To apply the NLP-algorithm to other domains and languages one needs to compile NooJ lexical resources (dictionaries), write grammars and work out the templates to extract terms.

To map a candidate term to the system term via lemma, system terms were also lemmatized. Each system term has been assigned a text property “lemma” with a label containing the lemma of a term.

To handle links between system terms and test tasks the new data provider was implemented. The provider supports periodic updating of links. The input of the provider is the URI of the course entity. The provider handles all links between subject terms and test tasks of the input course.

The provider is based on the following algorithm:
  • the provider collects tasks of the course using SPARQL queries;

  • the provider forms the plain text content for each task using the information about questions and answers of the task;

  • the provider launches NLP procedures in NooJ for the plain text content of the task;

  • the provider extracts candidate terms from the NooJ output file, the data contain a canonical form and lemma(s) for the candidate term;

  • the provider searches terms in the system to link them to candidate terms by using SPARQL queries; system terms and candidate terms are linked if they have the same lemma(s);

  • the provider creates a link between selected system terms and the task by using the “hasTerm” property.

The algorithm of NLP provider is shown in Fig. 7.
Fig. 7.

The algorithm of NLP provider

If a word sequence extracted with a morpho-syntactic pattern does not match any of the system terms, it becomes a candidate instance to be included in the system as a new system term. It is also necessary to validate it, e.g., via external sources. We have chosen DBpedia to validate candidate instances. A candidate instance is considered a new system term if its canonical form (a headword) completely matches the DBpedia instance’s property “rdfs:label” or “dbpedia-owl:wikiPageRedirects”, otherwise it is considered a false candidate. To avoid false matches results are filtered by the property “dcterms:subject”. Validation is considered successful in case one or more DBpedia instances were matched. The validated candidate instance is added to the system as a new candidate term and is linked to the task. It becomes an authentic system term after teachers’ approval.
Fig. 8.

The overall architecture of the ECOLE system

Below is an example of a SPARQL query posted to the DBpedia’s SPARQL-endpoint:

4 Implementation

4.1 Overall Architecture

The back-end is built on top of the Information Workbench platform14. The Information Workbench platform provides functions for interaction with Linked Open Data [16]. The platform is built on top of open source modules. The user interface of the system is based on the Semantic MediaWiki module [17]. An extension of the standard Wiki view Information Workbench provides predefined templates and widgets. RDF data management is based on the OpenRDF Sesame framework. The platform has support of SPARQL queries. The system has open SPARQL-endpoint for sharing its content. The front-end is implemented in Python15 and uses the Django Web Framework16.

The front-end collects the educational content from SPARQL-endpoint of the back-end. The front-end system stores additional data of the system, user’s private data and user management data in local storage.

The overall architecture of the ECOLE system is shown in Fig. 8.

4.2 User Interface

The front-end of the ontology-based e-learning system is a lightweight Learning Management System. The front-end is designed to represent educational content conveniently. It also manages user data, user settings, and the results of the user’s educational process. The front-end handles content administration, restricts access to educational content, and supports widgets for video lectures, tests, and practices.

The user interface of the front-end test page is shown in Fig. 9.
Fig. 9.

The user interface of the front-end test page

Data from the educational content are extracted with SPARQL queries to the Information Workbench SPARQL-endpoint [18]. The SPARQLWrapper Library17 is used to compile SPARQL queries. When the system user has finished the test, the module gathering user’s statistics sends the SPARQL Update Query [19] with user answers to the SPARQL-endpoint. When user statistics is gathered, objects having the type “AttemptToPassTest” and user’s answers to the test’s tasks are created in the system. The object with type “AttemptToPassTest” is bound to hash data of user’s e-mail.

Upon completion of the test the information about amount of correct answers is displayed to the student. Also the list of subject terms for repeating is presented to the student. The system generates the list of problematic terms for the student using test results and relations between subject terms and tasks of the test. For each subject term of the test system counts the rank based on student’s answers. The list is sorted using the rank of knowledge of each subject term. The student can view the knowledge rank of terms for certain test or the global rank of knowledge in the system context. The ranks of knowledge helps student to review his knowledge about certain subject field and increase it.

The user interface of the test result page is shown in Fig. 10.
Fig. 10.

The user interface of the test result page. 1 - low knowledge rank of term, “red zone”, 2 - medium knowledge rank of term, “yellow zone”, 3 - high knowledge rank of term, “green zone” (Color figure online)

4.3 Analytics Modules

The analysis of the quality of educational resources is performed inside the Analytics module. Each module has a separate analytics page that contains widgets, tables, and charts. The analytics page is the back-end system wiki page. The wiki page is based on the Semantic MediaWiki syntax and stored inside the Information Workbench system. The data of all UI elements on the page are obtained by using SPARQL queries.

Lecture Coverage. The analysis of lecture coverage by tests is performed inside this module. Both test and lecture entities are associated with the module entity so the results can be obtained by analyzing terms of the test as well as terms of the lecture [20]. Each module has a separate analytics page that contains widgets, tables and carts. The data of all UI elements on the page are obtained by using SPARQL queries. The system analytics page of the module includes basic statistics and lecture coverage statistics. The basic statistics comprises:
  • information about the total number of covered and uncovered terms of the module,

  • cover ratio of the module based on the ratio of the number of covered terms among total number of module terms,

  • a tag cloud of the most covered terms of the module,

  • a table of the test terms not covered by lectures of the module.

User interface of the basic statistics is shown in Fig. 11.
Fig. 11.

User interface of the basic statistics

The lecture coverage statistics shows the ratio of covered terms to the total number of lecture terms for each lecture of the module. The lecture coverage statistics is represented in a bar chart.

The user interface of the lecture coverage statistics is shown in Fig. 12.
Fig. 12.

User interface of the lecture coverage statistics

Troublesome terms. The data about the number of correct/incorrect user’s answers allows the system to calculate the knowledge rating for any of the system terms. Using this rating, teachers can determine which terms of the course students know worst of all [21]. The knowledge rating is calculated by subtracting the number of incorrect answers from the number of correct answers for all tasks which contains this term. This metric is quite simple and could be replaced by a ranking formula after elaborating a set of features.

Data about user’s answers is collected with the following SPARQL-query:
The analysis of troublesome terms in tests is performed inside the module. The system analytics page of the module includes a bar chart of the five most difficult terms for students and a table of all terms in the module with the knowledge rating. The user interface of the troublesome terms statistics is shown in Fig. 13.
Fig. 13.

The user interface of the troublesome terms statistics

The obtained set of analytical data helps evaluate adequacy of educational content and get a notion what content is ample or needs to be changed or added.

4.4 SCORM Converter

The learning content of the ECOLE system can not be integrated into Learning Management Systems such as Moodle. This is one of the main problems of integrating systems learning content into local environment of the university. For this purpose the converter to the Sharable Content Object Reference Model (SCORM) was developed. The SCORM is a set of standards for e-learning systems [22, 23]. The main goal of the SCORM Converter is to export learning courses from the ontology-based e-learning system and convert it to SCORM-conformant learning content. Exporting courses to SCORM standard makes it more affordable to use in e-learning systems.

The designed tool addresses a range of issues, such as:
  • extracting semantic data from the ontology-based e-learning system,

  • creating learning content using predefined templates,

  • constructing SCORM-conformant learning content,

  • supporting different interfaces for interaction with the service, such as the user interface and REST API.

The SCORM Converter provides a widget for generation of the SCORM-conformant learning content for certain course [24].

5 Evaluation and Results

The dataset of ECOLE system was created using manually obtained data. The part of the data was created by ontology population algorithms described in Sect. 3. The dataset contains objects of education such as courses, modules, lectures, subject fields and books. The statistics of the dataset of ECOLE system is shown in Table 3
Table 3.

The number of objects in the dataset of ECOLE system)









Subject fields


Subject terms


Subject terms mapped with DBpedia




Books from ITMO University


Books from BNB


Procedures for candidate term extraction and validation that were described in Sect. 3.3 have produced results displayed in Table 4. Term validation via DBpedia as it is proposed in the chapter is merely an idea rather than a technique. Using it in the described straightforward manner, we pursued the aim to remove the bulk of false candidates, not to validate the largest possible number of the candidate terms. Overall 30 different terms were extracted manually from 20 tasks and 5 times more candidate terms were extracted using POS-patterns. The system contains 24 terms on interference and diffraction in the physics course at the moment.
Table 4.

Evaluation of the NLP-module (for the English language)

Percent of linked tasks, %


Percent of non-linked tasks, %


Number of different candidate terms


Number of manually extracted different terms


Percent of system terms, matched by candidate terms, %


Percent of candidate terms, matched by system terms, %


Percent of candidate terms to be included to the system terms after the validation procedure, %


Percent of false candidates, %


The obtained results seem rather ambivalent: on the one side, 95 % of tasks were linked to at least one term. The leading system term is “Light”, that has been linked to 12 tasks. On the other hand 50 % of system terms remained unlinked to tasks and about 60 % of them demand addition of proper tasks. The validation procedure using DBpedia as an external source provided 9 terms to be added as candidate system instances (“wavelength”, “coherence”, “coherent light”, “diffraction”, “amplitude”, “aperture”, “diffraction pattern”, “optical path”, “optical path length”). We treat all the remaining terms (that do not match any system term and failed DBpedia validation procedure) as false candidates. However, actually a few of false candidates are true terms that are not present in the chosen categories of DBpedia (Concepts_in_Physics, Optics and Physical_optics), but are present in other DBpedia categories (e.g. “Fresnel diffraction”, “Fresnel zone” and “Fresnel number” are in the category: “Diffraction”, “Michelson interferometer” is in the category “Interferometers”). Some terms have different canonical form in Russian and English, e.g. “Young’s interference experiment” (is in DBpedia) corresponds to “Young’s experiment” in Russian (no term in DBpedia). Thus, developers depend completely upon the data segmentation of the external source. Besides, there is a far more challenging problem: a task may not contain explicitly the term it is intended to check. Consider the following example:

This task checks understanding of the Pythagorean Theorem but it contains no explicit information allowing to assign proper keywords to the task. Such tasks are quite numerous.

Right now the algorithm fails to process such tasks leaving them unlinked. Elaborating the algorithm to handle cases like this is a part of future work.

6 Conclusion and Future Work

The developed ontologies and population methods for ECOLE system allow teachers to use various educational content from different external resources in their electronic courses. The developed modules for the system provide teachers with a tool to maintain relevance and high quality of existing knowledge assessment modules. With these modules tutors could fluently change education resources, content, and tests keeping them up-to-date. The ECOLE system provides rating of the terms which caused difficulties for students. Based on this rating teachers can change theoretical material of the course by improving description of certain terms and add proper tasks. The rating of the subject terms is also provided for the students. The rating helps students to find their knowledge gaps in subject fields and fill them.

The ECOLE system collects educational content from different sources and shares it with university learning systems. With ECOLE system exchange of the educational content between universities and other organizations can be implemented.

Future work of ECOLE system implies an integration of various data sources. Knowledge bases, such as Wikidata, can be integrated into the system to describe subject terms. Taxonomies of subject fields can be used for analysis of the relations between subject terms. These relations strengthen the importance of subject term.

Future work for the NLP-module implies describing a set of term periphrases. The algorithm should also filter out candidate terms that are non-thematic to the course, e.g. if a term “vector” occurs in a task on physics, it should not be marked as a term highly relevant to the course on interference because it is introduced in another course. The idea is that a link is created between a system term and any term that occurred in the task, but terms that do not belong to the topic of the course should not be marked as terms missing in the course.

Term extraction procedure can be also improved for adding parallel texts of tasks. The provider needs to be refined to create test entities in several languages.

The term knowledge rating can be also refined after its replacement by the proper ranking formula. The rating should be calculated using data about importance of the subject terms and user activity in the learning process.

The front-end of the ECOLE system can be found at

The source code of the developed ontologies can be found at

The source code of the providers can be found at

Example of analytics for module"Interference and Coherence" can be found at

The source code of the SCORM Converter can be found at




This work was partially financially supported by the Government of Russian Federation, Grant 074-U01.


  1. 1.
    Hendrix, M., Protopsaltis, A., Dunwell, I., de Freitas, S., Petridis, P., Arnab, S., Dovrolis, N., Kaldoudi, E., Taibi, D., Dietze, S., Mitsopoulou, E., Spachos, D., Bamidis, P.: Technical evaluation of The mEducator 3.0 linked data-based environment for sharing medical educational resources. In: The 2nd International Workshop on Learning and Education with the Web of Data at the World Wide Web Conference, Lyon, France (2012)Google Scholar
  2. 2.
    Khalili, A., Auer, S., Tarasowa, D., Ermilov, I.: Slidewiki: elicitation and sharing of corporate knowledge using presentations. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 302–316. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  3. 3.
    Zablith, F.: Interconnecting and enriching higher education programs using linked data. In: Proceedings of the 24th International Conference on World Wide Web Companion. International World Wide Web Conferences Steering Committee (2015)Google Scholar
  4. 4.
    Mouromtsev, D., Kozlov, F., Parkhimovich, O., Zelenina, M.: Development of an ontology-based E-learning system. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2013. CCIS, vol. 394, pp. 273–280. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  5. 5.
    Keler, C., d’Aquin, M., Dietze, S.: Linked data for science and education. Semant. Web 4(1), 1–2 (2013)CrossRefGoogle Scholar
  6. 6.
    Kauppinen, T., Trame, J., Westermann, A.: Teaching core vocabulary specification. LinkedScience. org, Technical Report (2012)Google Scholar
  7. 7.
    McGuinness, D.L., Van Harmelen, F.: OWL web ontology language overview. W3C recommendation 10.10. (2004)Google Scholar
  8. 8.
    Halpin, H., Hayes, P.J.: When owl: sameas isn’t the same: an analysis of identity links on the semantic web. In: LDOW (2010)Google Scholar
  9. 9.
    Soldatova, L., Mizoguchi, R.: Ontology of test. In: Proceedings of the Computers and Advanced Technology in Education, pp. 173–180 (2003)Google Scholar
  10. 10.
    Vas, R.: Educational ontology and knowledge testing. Electron. J. Knowl. Manage. 5(1), 123–130 (2007)MathSciNetGoogle Scholar
  11. 11.
    Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing (EMNLP 2003), PP. 216–223 (2003)Google Scholar
  12. 12.
    Khokhlova, M.V.: Lexico-syntactic patterns as a tool for extracting lexis of a specialized knowledge domain. In: Proceedings of the Annual International Conference Dialogue (2012). (in Russian)Google Scholar
  13. 13.
    Bolshakova, E., Vasilieva, N.: Formalizacija leksiko-sintaksicheskoj informacii dlja raspoznavanija reguljarnyh konstrukcij estestvennogo jazyka [Formalizing lexico-syntactic information to extract natural language patterns]. Programmnye produkty i sistemy [Software and Systems], vol. 4, pp. 103–106 (2008)Google Scholar
  14. 14.
    Silberztein, M.: NooJ for NLP: a linguistic development environment (2002 ).
  15. 15.
    Silberztein, M.: NooJManual [Electronic resource], p. 99 (2003).
  16. 16.
    Haase, P., Schmidt, M., Schwarte, A.: The information workbench as a self-service platform for linked data applications. In: COLD (2011)Google Scholar
  17. 17.
    Krötzsch, M., Vrandečić, D., Völkel, M.: Semantic mediawiki. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 935–942. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  18. 18.
    Holovaty, A., Kaplan-Moss, J.: The Definitive Guide to Django. Apress, Berkley (2009). Estados Unidos: EditorialCrossRefGoogle Scholar
  19. 19.
    Gearon, P., Passant, A. Polleres, A.: SPARQL 1.1 Update. World Wide Web Consortium (2013)Google Scholar
  20. 20.
    Parkhimovich, O., Mouromtsev, D., Kovrigina, L., Kozlov, F.: Linking E-learning ontology concepts with NLP algorithms. In: Proceedings of the 16th Conference of Open Innovations Association FRUCT (2014)Google Scholar
  21. 21.
    Kovriguina, L., Mouromtsev, D., Kozlov, F., Parkhimovich, O.A.: Combined method for E-learning ontology population based on NLP and user activity analysis. In: CEUR-WS Proceedings, vol. 1254, pp. 1–16 (2014)Google Scholar
  22. 22.
    Bohl, O., Scheuhase, J., Sengler, R., Winand, U.: The sharable content object reference model (SCORM)-a critical review. In: Computers in Education, pp. 950–951 (2002)Google Scholar
  23. 23.
    Qu, C., Nejdl, W.: Towards interoperability, reusability of learning resources: A SCORM-conformant courseware for computer science education. In: 2nd IEEE International Conference on Advanced Learning Technologies, Kazan, Tatarstan, Russia (2002)Google Scholar
  24. 24.
    Kozlov, F.: A tool to convert linked data of E-learning system to the SCORM standard. In: Klinov, P., Mouromtsev, D. (eds.) Knowledge Engineering and the Semantic Web. CCIS, vol. 468, pp. 229–236. Springer, Heidelberg (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Vladimir Vasiliev
    • 1
  • Fedor Kozlov
    • 1
  • Dmitry Mouromtsev
    • 1
  • Sergey Stafeev
    • 1
  • Olga Parkhimovich
    • 1
  1. 1.ITMO UniversitySt. PetersburgRussia

Personalised recommendations