1 Introduction

Ontology matching or ontology alignment is the non-trivial task of finding correspondences between entities of a set of given ontologies [10]. The matching can be performed manually or through the use of an automated matching system. For systematically evaluating the quality of such matchers, the Ontology Alignment Evaluation Initiative (OAEI) has been running campaigns [9] every year since 2005. Unlike other evaluation campaigns where researchers submit data sets as solutions to report their results (such as KaggleFootnote 1), the OAEI requires participants to submit a matching system, which is then executed on-site. After the evaluation, the results are publicly reportedFootnote 2. Therefore, execution and evaluation platforms have been developed and OAEI participants are required to package and submit their matching system for the corresponding platform. Two well-known platforms are used in the ontology matching community: The Semantic Evaluation at Large Scale (SEALS)Footnote 3 [12, 35] and the more recent Holistic Benchmarking of Big Linked Data (HOBBIT)Footnote 4 [24].

Based on the results of the OAEI 2018 campaign [1], only 4 out of 12 tracks were available in HOBBIT (LargeBio, Link Discovery, SPIMBENCH, KnowledgeGraph). Out of 19 matchers that were submitted in the 2018 campaign, only 6 matchers supported both, SEALS and HOBBIT, and 2 supported HOBBIT exclusively. The remaining 11 matchers supported only SEALS. While one reason for the low HOBBIT adoption might be its novelty, it also requires more steps to package a matcher for the HOBBIT platform and knowledge of the DockerFootnote 5 virtualization software. In particular for new entrants to the ontology matching community, the existing tooling might appear overwhelmingly complicated. In addition to potential obstacles for matcher development and submission, another observation from the OAEI campaigns is that the evaluation varies greatly among the different tracks that are offered e.g. Anatomy results contain Recall+ as well as alignment coherence whereas the Conference track focuses on different reference alignments. Due to limited group evaluation capabilities in existing frameworks, some track organizers even developed their own evaluation systems.

For these reasons we present the Matching EvaLuation Toolkit (MELT)Footnote 6 – an open source toolkit for ontology matcher development, fine-tuning, submission, and evaluation. The target audience are matching system developers as well as researchers who run evaluations on multiple matching systems such as OAEI track organizers. Likewise, system developers can use this tool to analyze the performance and errors of their systems in order to improve it. Furthermore, they can package and submit the system easily to OAEI campaigns.

The rest of this paper is structured as follows: Sect. 2 describes other work in the field of alignment visualization and evaluation. Section 3 gives an overview of the MELT framework and its possibilities whereas Sect. 4 shows an exemplary analysis of the latest systems submitted to the OAEI. We finish with an outlook on future developments.

2 Related Work

As MELT can be used both for evaluating ontology matching tools, as well as visualizing matching results, we discuss related works in both fields.

2.1 Matching and Alignment Evaluation Platforms

OAEI campaigns consist of multiple problem sets, so called tracks. Each track has its organizers who provide the datasets including reference alignments, execute the matching systems, and prepare the results page for the participants and the whole community. The track contains one or more test cases which correspond to a specific matching task consisting of two ontologies and a reference alignment. In 2010, three tracks (Benchmark, Anatomy, and Conference) were adjusted to be run with the SEALS platform [8]. One year later, participants of OAEI campaigns had to implement a matching interface and the SEALS client was the main tool used for executing and evaluating matchers. The interface contains a simple method (align()) which receives a URL for the source and a URL for the target ontology and has to return a URL which points to a file containing all correspondences in the alignment formatFootnote 7. This format is defined and used by the Alignment API [5].

Starting from 2017, a second evaluation platform, called HOBBIT, was added [18]. One difference compared to SEALS is that the system has to be submitted as a Docker image to a GitLab instanceFootnote 8, and in the corresponding project, a matcher description file has to be created. After submission of the matching system, the whole evaluation runs on servers of the HOBBIT platform. Thus, the source code for evaluating the matchers has to be submitted as a Docker image as well. All Docker containers communicate with each other over a message broker (RabbitMQFootnote 9). Hence, the interface between a system and the evaluation component can be arbitrary. To keep a similar interface to SEALS, the data generation component transfers two ontologies and the system adapter receives the URL to these files. It should return a file similar to the SEALS interface.

Working with alignments in Java code can be achieved with the Alignment API [5]. It is the most well-known API for ontology matching and can be used for loading and persisting alignments as well as for evaluating them with a set of possible evaluation strategies. Moreover, it provides some matching systems which are also used in OAEI campaigns as a baseline. Unfortunately, it is not yet enabled to be used with the maven build systemFootnote 10. Therefore, instead of using this API, some system developers created their own classes to work with alignments and to store them on diskFootnote 11 in order to be compatible with the evaluation interface.

Alignment Visualization. A lot of work has been done in the area of analyzing, editing, and visualizing alignments or ontologies with a graphical user interface. One example is Alignment Cubes [15], which allows an interactive visual exploration and evaluation of alignments. An advantage is the fine grained analysis on the level of an individual correspondence. It further allows to visualize the performance history of a matcher, for instance, which correspondences a matcher found in the most recent OAEI campaign but not in the previous one. Another framework for working with alignment files is VOAR [28, 29]. It is a Web-based system where users can upload ontologies and alignments. VOAR then allows the user to render them with multiple visualization types. The upload size of ontologies as well as alignments is restricted so that very large files cannot be uploaded.

Similar to VOAR, the SILK workbench [33] is also a Web-based tool with a focus on link/correspondence creation between different data sets in the Linked Open Data CloudFootnote 12. Unlike VOAR, it usually runs on the user’s computer. Matching operations (such as Levenshtein distance [20]) are visualized as nodes in a computation graph. The found correspondences are displayed and can be modified to further specify which concepts should be matched.

Further visualization approaches were pursued by matching system developers to actually fine-tune their systems. All these visualizations are therefore very specific to a particular matching approach. One such example is YAM++ [23], which is a matching system based on a machine learning approach. Results are visualized in a split view where the class hierarchy of the two input ontologies is shown on each side lines are drawn between the matched classes. The user can modify the alignment with the help of this GUI. In a similar way, the developers of COMA++ [2] created a user interface for their results. A visualization of whole ontologies is not implemented by the current tools but can be achieved with the help of VOWL [21] or Web Protégé [32], for instance.

Our proposed framework MELT allows for detailed and reusable analyses such as the ones presented in this section due to its flexible metrics and evaluators. An overview of the framework is presented in the following section.

3 Matching Evaluation Toolkit

MELT is a software framework implemented in Java which aims to facilitate matcher development, configuration, packaging, and evaluation. In this section, we will first introduce Yet Another Alignment API, an API for ontology alignment which is integrated into the framework. Afterwards, the matcher development process in MELT is introduced. Subsections 3.3 and 3.4 cover specific aspects of the framework that have not yet been explicitly addressed in the community: The implementation of matchers outside of the Java programming language Subsect. 3.3 and the chaining matching workflows Subsect. 3.4. After explaining the tuning component of the framework, this section closes with the matcher evaluation process in MELT.

3.1 YAAA: Yet Another Alignment API

To allow for a simple development workflow, MELT contains Yet Another Alignment API (YAAA). It is similar to the Alignment API presented earlier but contains additional improvements such as maven support and arbitrary indexing possibilities of correspondence elements allowing queries such as “retrieve all correspondences with a specific source”. This is very helpful for a fast evaluation of large-scale test cases containing large reference or system alignments. The indexing is done with the cqengine libraryFootnote 13. The API is, in addition, capable of serializing and parsing alignments. It also makes sure that all characters are escaped and that the resulting XML is actually parseableFootnote 14. As explainability is still an open issue in the ontology matching community [7, 34], YAAA also allows for extensions to correspondences and alignments. This means that additional information such as debugging information or human-readable explanations can be added. If there is additional information available in the alignment, it will also be printed by the default CSVEvaluator which allows for immediate consumption in the analysis and evaluation process and hopefully fosters the usage of additional explanations in the alignment format.

It is important to note that MELT does not require the usage of YAAA for parameter tuning, executing, or packaging a matcher – but also works with other APIs such as the Alignment API. This allows to evaluate matchers that were not developed using YAAA (see Sect. 4).

3.2 Matcher Development Workflow

In order to develop a matcher in Java with MELT, the first step is to decide which matching interface to implement. The most general interface is encapsulated in class MatcherURL which receives two URLs of the ontologies to be matched together with a URL referencing an input alignment. The return value should be a URL representing a file with correspondences in the alignment format. Since this interface is not very convenient, we also provide more specialized classes. In the matching-yaaa package we set the alignment library to YAAA. All matchers implementing interfaces from this package have to use the library and get at the same time an easier to handle interface of correspondences. In further specializations we also set the Semantic Web framework which is used to represent the ontologies. For a better usability, the two most well-known frameworks are integrated into MELT: Apache JenaFootnote 15 [3] (MatcherYAAAJena) and the OWL APIFootnote 16 [14] (MatcherYAAAOwlApi). As the latter two classes are organized as separate maven projects, only the libraries which are actually required for the matcher are loaded. In addition, further services were implemented such as an ontology cache which ensures that ontologies are parsed only once. This is helpful, for instance, when the matcher accesses an ontology multiple times, when multiple matchers work together in a pipeline, or when multiple matchers shall be evaluated. We explicitly chose a framework-independent architecture so that developers can use the full functionality of the frameworks they already know rather than having to understand an additional wrapping layer. The different levels at which a matcher can be developed as well as how the classes presented in this section work together, are displayed in Fig. 1.

Fig. 1.
figure 1

Different possibilities to implement matchers

3.3 External Matching

The current ontology matching development and evaluation frameworks that are available focus on the Java programming language. As researchers apply advances in machine learning and natural language processing to other domains, they often turn to Python because leading machine learning libraries such as scikit-learnFootnote 17, TensorFlowFootnote 18, PyTorchFootnote 19, KerasFootnote 20, or gensim [26] are not easily available for the Java language. In the 2018 OAEI campaign, the first tools using such frameworks for ontology matching have been submitted [1].

To accommodate for the changes outlined, MELT allows to develop a matcher in any other programming language and wrap it as a SEALS or HOBBIT package. Therefore, class MatcherExternal has to be extended. It has to transform the given ontology URIs and input alignments to an executable command line call. The interface for the external process is simple. It receives the input variables via the command line and outputs the results via the standard output of the process – similar to many Unix command line tools. An example for a matcher implemented in Python is available on GitHubFootnote 21. It also contains a simple implementation of the alignment format to allow Python matchers serializing their correspondences.

When executing the matcher with the SEALS client, the matching system is loaded into the Java virtual machine (JVM) of the SEALS client (evaluation code) with a customized class loader. This raises two points: (1) The code under test is executed in the same JVM and can probably access the code for evaluation. (2) The used class loader from the JCL libraryFootnote 22 does not implement all methods (specifically getPackage() and getResource()) of a class loader. However, these methods are used by other Java librariesFootnote 23 to load operating system dependent files contained in the jar file. Thus, some libraries do not work when evaluating a matcher with SEALS. Another problem is that all libraries used by the matching system may collide with libraries used by SEALS. This can cause issues with Jena and other Semantic Web frameworks because of the same JVM instance. To solve this issue, MatcherExternal can not only be used for matchers written in another programming language but also for Java matchers which use dependencies that are incompatible with the SEALS platform.

3.4 Pipelining Matchers

Ontology matchers often combine multiple matching approaches and sometimes consist of the same parts. An example would be a string-based matching of elements, and the application of a stable marriage algorithm or another matching refinement step on the resulting similarity matrix.

Following this observation, MELT allows for the chaining of matchers: The alignment of one matcher is then the input for the next matcher in the pipeline. The ontology caching services of MELT mentioned above prevent performance problems arising from repetitive loading and parsing of ontologies.

In order to execute a matcher pipeline, classes MatcherPipelineYAAA (for matchers that use different ontology management frameworks), MatcherPipelineYAAAJena (for pure Jena pipelines), and MacherPipelineYAAAOwlApi (for pure OWL API pipelines) can be extended. Here the initializeMatchers() method has to be implemented. It returns matcher instances as a List in the order in which they shall be executed. These reusable parts of a matcher can easily be uploaded to GitHub to allow other developers to use common functionalityFootnote 24.

3.5 Tuning Matchers

Many ontology matching systems require parameters to be set at design time. Those can significantly influence the matching system’s performance. An example for a parameter would be the threshold parameter of a matcher utilizing a normalized string distance metric. For tuning such a system, MELT offers a GridSearch functionality. It requires a matcher and one or more parameters together with their corresponding search spaces, i.e. the values that shall be tested. The Cartesian product of these values is computed and each system configuration (an element of the Cartesian product which is a tuple of values) runs on the specified test case. The result is an ExecutionResultSet which can be further processed like any other result of matchers in MELT. To speed up the execution, class Executor was extended and can run matchers in parallel. Properties can be specified by a simple string. Therefore, the JavaBeans specificationFootnote 25 is used to access the properties with so called setter-methods. This strategy allows also to change properties of nested classes or any list or map. An example of a matcher tuning can be found in the MELT repositoryFootnote 26.

3.6 Evaluation Workflow

MELT defines a workflow for matcher execution and evaluation. Therefore, it utilizes the vocabulary used by the OAEI: A matcher can be evaluated on a TestCase, i.e. a single ontology matching task. One or more test cases are summarized in a Track. MELT contains a built-in TrackRepository which allows to access all OAEI tracks and test cases at design time without actually downloading them from the OAEI Web page. At runtime TrackRepository checks whether the required ontologies and alignments are available in the internal buffer; if data is missing, it is automatically downloading and caching it for the next access. The caching mechanism is an advantage over the SEALS platform which downloads all ontologies again at runtime which slows down the evaluation process if run multiple times in a row.

One or more matchers are given, together with the track or test case on which they shall be run, to an Executor. The Executor runs a matcher or a list of matchers on a single test case, a list of test cases, or a track. The run() method of the executor returns an ExecutionResultSet. The latter is a set of ExecutionResult instances which represent individual matching results on a particular test case. Lastly, an Evaluator accepts an ExecutionResultSet and performs an evaluation. Therefore, it may use one or more Metric objects. MELT contains various metrics, such as a ConfusionMatrixMetric, and evaluators. Nonetheless, the framework is designed to allow for the further implementation of evaluators and metrics.

After the Executor has run, an ExecutionResult can be refined by a Refiner. A refiner takes an individual ExecutionResult and makes it smaller. An example is the TypeRefiner which creates additional execution results depending on the type of the alignment (classes, properties, datatype properties, object properties, instances). Another example for an implemented refiner is the ResidualRefiner which only keeps non-trivial correspondences and can be used for metrics such as recall+. Refiners can be combined. This means that MELT can calculate very specific evaluation statistics such as the residual precision of datatype property correspondences.

A novelty of this framework is also the granularity at which alignments can be analyzed: The EvaluatorCSV writes every correspondence in a CSV format together with further details about the matched resources and the performed refinements. This allows for an in-depth analysis in various spreadsheet applications such as LibreOffice Calc where through the usage of filters analytical queries can be performed such as “false-positive datatype property matches by matcher X on test case Y”.

4 Exemplary Analysis of OAEI 2018 Results

In order to demonstrate the capabilities of MELT, a small analysis of the OAEI 2018 results for the Conference and Anatomy track has been performed and is presented in the following.

The Conference track consists of 16 ontologies from the conference domain. We evaluated all matching systems that participated in the 2018 campaign: ALIN [30], ALOD2Vec [25], AML [11], DOME [13], FCAMapX [4], Holontology [27], KEPLER [19], Lily [31], LogMap and LogMapLt [17], SANOM [22], as well as XMap [6].

The Anatomy track consists of a mapping between the human anatomy and the anatomy of a mouse. In the 2018 campaign, the same matchers mentioned above participated with the addition of LogMapBio, a matcher from the LogMap family [17].

First, the resulting alignments for AnatomyFootnote 27 and ConferenceFootnote 28 have been downloaded from the OAEI Web site. As both result sets follow the same structure every year, the MELT functions Executor.loadFromAnatomyResultsFolder() and Executor.loadFromConferenceResultsFolder() were used to load the results. The resulting ExecutionResultSet was then handed over to the MatcherSimilarityMetric and rendered using the MatcherSimilarityLatexHeatMapWriter. As the Conference track consists of multiple test cases, the results have to be averaged. Here, out of the available calculation modes in MELT, micro-average was chosen as this calculation mode is also used on the official results pageFootnote 29 to calculate precision and recall scores. Altogether, the analysis was performed with few lines of Java code.Footnote 30

Tables 1 and 2 show the Jaccard overlap [16] of the correspondences rendered as heat map where darker colors indicate a higher similarity. The Jaccard coefficient \(J \in [0, 1]\) between two alignments \(a_1\) and \(a_2\) with correspondences \(corr(a_1)\) and \(corr(a_2)\) was obtained as follows:

$$\begin{aligned} J(a_1, a_2) = \frac{|corr(a_1) \cap corr(a_2)|}{|corr(a_1) \cup corr(a_2)|} \end{aligned}$$

In Table 1 it can be seen that – despite the various approaches that are pursued by the matching systems – most of them arrive at very similar alignments. One outlier in this statistic is Holontology. This is due to the very low number of correspondences overall found by this matching system (456 as opposed to ALIN, which had the second-smallest alignment with 928 matches).

Similarly, the matching systems of the Conference track also show commonalities in their alignments albeit the similarity here is less pronounced compared to the Anatomy track: The median similarity (excluding perfect similarities due to self-comparisons) of matching systems for Anatomy is \(median_{Anatomy} = 0.7223\) whereas the median similarity for Conference is \(median_{Conference} = 0.5917\). The lower matcher similarity median indicates that Conference is a harder matching task because the matching systems have more disagreement about certain correspondences.

Table 1. OAEI anatomy 2018 alignment similarity
Table 2. OAEI conference 2018 alignment similarity

In a second step, the same result from the MatcherSimilarityMetric has been printed by another writer (MatcherSimilarityLatexPlotWriter) which plots the mean absolute deviation (MAD) on the X-axis and the \(F_1\) score on the Y-axis. MAD was obtained for each matcher by applying

$$\begin{aligned} MAD = \frac{1}{n} \sum ^{n}_{i=1} | x_i - mean(X)| \end{aligned}$$

where X is the set of Jaccard similarities for a particular matcher. The resulting plots are shown in Figs. 2 and 3. It can be seen that the matchers form different clusters: Anatomy matchers with a high \(F_1\) measure have also a high deviation. Consequently, those matchers are likely candidates for a combination to achieve better results. On Conference, on the other hand, good combinations cannot be derived because the best matchers measured by their \(F_1\) score tend not to deviate much in their resulting alignments.

Fig. 2.
figure 2

Matcher comparison using MAD and \(F_1\) on the Anatomy data set

Fig. 3.
figure 3

Matcher comparison using MAD and \(F_1\) on the Conference data set

In addition to the evaluations performed using the matcher similarity metric, the EvaluatorCSV was run using the OAEI 2018 matchers on the Anatomy and Conference tracks. The resulting CSV file contains one row for each correspondence together with additional information about each resource that is mapped (e.g. label, comment, or type) and with additional information about the correspondence itself (e.g. residual match indicator or evaluation result). All files are available online for further analysis on correspondence level.Footnote 31

5 Conclusion

With MELT, we have presented a framework for ontology matcher development, configuration, packaging, and evaluation. We hope to lower the entrance barriers into the ontology matching community by offering a streamlined development process. MELT can also simplify the work of researchers who evaluate multiple matchers on multiple data sets such as OAEI track organizers through its rich evaluation capabilities.

The evaluation capabilities were exemplarily demonstrated for two OAEI tracks by providing a novel view on matcher similarity. The MELT framework as well as the code used for the analyses presented in this paper are open-source and freely available.

Future work will focus on adding more evaluation possibilities in the form of further refiners and reasoners, providing more default matching functionalities such as modular matchers that can be used in matching pipelines, and developing visual evaluation support based on the framework to allow for better ontology matcher results comparisons.