Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In the context of Semantic Web (SW), ontologies and the ability to perform reasoning on them, via deductive methods, play a key role. However, standards inference mechanisms have also shown their limitations due to the incompleteness of ontological knowledge bases deriving from the Open World Assumption (OWA). In order to overcome this problem, alternative forms of reasoning, such as inductive reasoning, have been adopted to perform various tasks such as concept retrieval and query answering [1, 2]. These tasks have been cast as a classification problem, consisting in deciding the class-membership of an individual with respect to a query concept, to be solved through inductive learning methods that exploit statistical regularities in a knowledge base. The resulting models can be directly applied to the knowledge base or mixed with deductive reasoning capabilities [3]. Although the application of these methods has shown interesting results and the ability to induce assertional knowledge that is not logically derivable, these methods have also revealed some problems due to the aforementioned incompleteness. In general, the individuals that are positive and negative instances for a given concept may not be equally distributed. This skewness may be stronger when considering individuals whose membership cannot be assessed because of the OWA. This class-imbalance setting may affect the model, resulting with poor performances.

Various methods have been devised for tackling the problem, spanning from sampling methods to ensemble learning approaches [4]. Concerning the specific task of instance classification for inductive query answering on SW knowledge bases, we investigated on the usage of ensemble methods [5], where the resulting model is built by training a certain number of classifiers, called weak learners, and the predictions returned by each weak learner are combined by a rule standing for the meta-learner. Specifically, we proposed an algorithm for inducing Terminological Random Forests (TRFs) [5], an ensemble of Terminological Decision Trees (TDTs) [6]. The method extends Random Forests and First Order Random Forests [7, 8] to the case of DL representation languages. When these models are employed, the membership for a test individual is decided according to a majority vote rule (although various strategies for combining predictions have been proposed [911]): each classifier returning a vote in favor of a class equally contributes to the final decision. In this way, some aspects are not considered explicitly, such as the uncertainty about the class label assignment and the disagreement that may exist among weak learners. The latter plays a crucial role for the performance of ensemble models [12]. In the specific case of TRFs, we noted that most misclassifications were related to those situations in which votes are distributed evenly with respect to the admissible labels.

A weighted voting procedure may be an alternative strategy to mitigate the problem, but it requires a criterion for setting the weights. In this sense, introducing a meta-learner which manipulates soft predictions of each classifier (i.e. a prediction with a confidence measure for each class value) rather than hard predictions (where a class value is returned) may be a solution. For TRFs, this can be done by considering the extension of TDT models based on the Dempster-Shafer Theory (DS) [13], which provides an explicit representation of ignorance and uncertainty (differently from the original version proposed in [6]). In machine learning, resorting to the DS operators is a well-known solution [14]. Most of the existing ensemble combination methods resort to a solution based on decision templates, which are obtained by organizing, for each classifier against each class, a mean vector (called reference vector). When these methods are employed, predictions are typically made by computing the similarity value between a decision profile of an unknown instance with the decision templates. Other approaches that does not require the computation of these matrices have been proposed [14]. However, all the methods consider a propositional representation. Additionally, none of them has been employed for predicting assertions on ontological knowledge bases.

The main contribution of the paper concerns the definition of a framework for the induction of Evidential Terminological Random Forests for ontological knowledge bases. This is an ensemble learning approach that employs Evidential TDTs (ETDTs) [13] and does not require the computation of decision templates, similarly to [14]. After the induction of the forest, a new individual is classified by combining, by means of the Dempster’s rule [15], the available evidence on the membership coming from each tree.

The remainder of the paper is organized as follows: the next section recalls the basics of the Dempster-Shafer Theory; Sect. 3 presents the novel framework for evidential terminological random forests, while in Sect. 4, a preliminary empirical evaluation is described. Sect. 5 draws conclusions and illustrate perspectives for further developments.

2 Basics on the Dempster-Shafer Theory

The Dempster-Shafer Theory (DS) is basically an extension of the Bayesian subjective probability. In the DS, the frame of discernment is a set of exhaustive and mutually exclusive hypotheses \(\varOmega =\{ \omega _1, \omega _2,\cdots , \omega _n \}\) about a domain. For instance, the frame of discernment for a classification problem could be the set of all admissible class values. Moving from this set, it is possible to define a Basic Belief Assignment (BBA) as follows:

Definition 1

(Basic Belief Assignment). Given a frame of discernment \(\varOmega =\{ \omega _1, \omega _2, \ldots , \omega _n \}\). A Basic Belief Assignment (BBA) is a function that defines a mapping \(m: 2^{\varOmega } \rightarrow [0,1]\) such that:

$$\begin{aligned} \sum _{A \in 2^{\varOmega }} m(A)=1 \end{aligned}$$
(1)

Given a piece of evidence, the value of a BBA \(m\) for a set \(A\) expresses a measure of belief exactly committed to \(A\). This means that the value \(m(A)\) does imply no further claims about any of its subsets. This means that when \(A=\varOmega \), a case of total ignorance occurs. Each element \(A \in 2^{\varOmega }\) for which \(m(A)>0\) is said to be a focal element for \(m\). The function \(m\) can be used to define other functions, such as the belief and the plausibility function.

Definition 2

(Belief Function and Plausibility Function). For a set \(A \subseteq \varOmega \), the belief in A, denoted \(Bel(A)\), represents a measure of the total belief committed to A given the available evidence.

$$\begin{aligned} \forall A, B \in 2^{\varOmega }\quad Bel(A)=\sum _{B\subseteq A} m(B) \end{aligned}$$
(2)

The plausibility of A, denoted \(Pl(A)\), represents the amount of belief that could be placed in A, if further information became available.

$$\begin{aligned} \forall A, B \in 2^{\varOmega } \quad Pl(A)=\sum _{B\cap A\ne \emptyset } m(B) \end{aligned}$$
(3)

It can be proved that, knowing just one among \(m\), \(Bel\) and \(Pl\) allows to derive all the other functions [16].

In the DS, various measures for quantifying the amount of uncertainty have been proposed, e.g. the non-specificity measure [17]. The latter can be regarded as a measure for representing the imprecision of a BBA function. This measure can be computed by the following equation:

$$\begin{aligned} Ns=\sum _{A\in 2^{\varOmega }}m(A)\log (|A|) \end{aligned}$$
(4)

It is easy to note that the non-specificity value is higher when the focal elements are larger subsets of \(\varOmega \), for the elements of which no further claims can be made.

One of the most important aspects related to the DS is the availability of various operators for pooling evidence from different sources of information. One of them, called Dempster’s rule, aggregates independent evidences defined within the same frame of discernment. Let \(m_1\) and \(m_2\) be two BBAs. The new BBA obtained by combining \(m_1\) and \(m_2\) using the rule of combination, \(m_{12}\), can be expressed by the orthogonal sum of \(m_1\) and \(m_2\). Generally, the normalized version of the rule is used:

$$\begin{aligned} \forall A, B, C \subseteq \varOmega \quad m_{12}(A) = m_1\oplus m_2 = \frac{1}{1-c}\sum _{ B\cap C=A} m_1(B)m_2(C) \end{aligned}$$
(5)

where the conflict \(c\) can be computed as: \( c= \sum _{B \cap C = \emptyset } m_1(B) m_2(C)\)

In the DS, the independence of the available evidences is typically a strong constraint that can be relaxed by using further combinations rules, e.g. the Dubois-Prade’s rule [18].

$$\begin{aligned} m_{12}(A) = \sum _{B\cup C=A} m_1(B)m_2(C) \end{aligned}$$
(6)

Differently from the Dempster’s rule, the latter considers the union between two sets of hypothesis rather than their intersection. As a result, the conflict between sources of information does not exists.

3 Evidence-Based Ensemble Learning for Description Logic

The TDT (and RF) learning approach is now recalled before introducing the method for the induction of an evidence-based versions of these classification models.

3.1 Class-Imbalance and Terminological Random Forests

In machine learning, the class-imbalance problem concerns the skewness of training data distribution. Considering a multilabel setting, where the number of class label is greater than 3, the problem usually occurs when the number of training instances belonging to the a particular class (the majority class) overwhelms the number of those belonging to the other classes (which represent the majority class). In order to tackle the problem, most common strategies based on sampling strategy have been proposed [19]. One of the simplest method is an under-sampling strategy that randomly discards instances belonging to the majority class in order to re-balance the dataset. However, this method causes a loss of information due to the possible discarding of useful examples required for inducing a quite predictive model. A Terminological Random Forest (TRF) is an ensemble model trained through a procedure that combines a random under-sampling strategy with the ensemble learning induction [5]. The main purpose for the induction of these models is to mitigate the loss of information mentioned above in the context of SW knowledge bases. A TRF is basically made up of a certain number of Terminological Decision Trees (TDTs) [6], where each of them is built by considering a (quasi-)balanced dataset. The ensemble model assigns the final class for a new individual by appealing to a majority vote procedure. Therefore each TDT returns an hard prediction: this means that each tree contributes equally to the decision concerning the class label, regardless its confidence about predictions. In order to consider also this kind of information and tackling sundry problems as the uncertainty about the class assignment (i.e. when the confidence about either a class or another one is approximately equals) and the disagreement between classifiers that may lead to misclassifications [5], we need to resort to other models for the ensemble approach, such as Evidential Terminological Decision Trees [13].

3.2 Evidential Terminological Decision Trees

In [13], it has been shown how the class-membership prediction task can be tackled by inducing Evidential Terminological Decision Trees (ETDTs), an extension of the TDTs [6] based on evidential reasoning. ETDTs are defined in a similar way of TDTs. However, unlike TDTs, each node contains a couple \(\langle D,m \rangle \), where \(D\) is a DL concept description and \(m\) is BBA concerning the membership w.r.t. \(D\), rather than the sole concept description. Practically, to learn an ETDT model, a set of concept descriptions is generated from the current node by resorting to the refinement operator, denoted by \(\rho \). For each concept, a BBA is also computed by considering the positive, negative and uncertain instances w.r.t. the generated concept. Then the best description (and the corresponding BBA) is selected, i.e. the one having the smallest non-specificity measure value w.r.t. the previous level. In other words, this means that the description is the one having the most definite membership.

Figure 1 reports a simple example of ETDT used for predicting whether a car is to be sent back to the factory (SendBack) or can be repaired. We can observe that the root concept \(\mathsf {\exists hasPart.\top }\) is progressively specialized. Additionally, the concepts installed into the intermediate nodes are characterized by a decreasingly non specificity measure value.

Fig. 1.
figure 1

A simple example of ETDT: each nodes contains a DL concept description and a BBA obtained by counting the instances that reach the node during the training phase

3.3 Evidential Terminological Random Forests

An Evidential Terminological Random Forest (ETRF) is an ensemble of ETDTs. We will focus on the procedures for producing an ETRF and for predicting class-membership of input individuals exploiting an ETRF. Moving from the formulation of the concept learning problem proposed in [5], we will use the label set \(\mathcal {L}=\{-1, +1\}\) as frame of discernement of the problem. The labels in \(\mathcal {L}\) are usually used to denote, respectively, the cases of positive and negative membership w.r.t. a target concept \(C\). However, in order to represent the uncertain-membership related to the Open World Assumption, we will employ the label set \(\mathcal {L}'=2^\mathcal {L}\setminus \{\emptyset \}\) and the singletons \(\{+1\}\) and \(\{-1\}\) to denote the positive and negative membership w.r.t. \(C\) while the case of uncertain-membership will be labeled by \(\mathcal {L}=\{-1,+1\}\).

figure a

Growing ETRFs. Algorithm 1 describes the procedure for producing an ETRF. In order to do this, the target concept \(C\), a training set \(\mathsf {Tr}\subseteq \mathsf Ind (\mathcal {A})\) and the desired number of trees \(n\) are required. \(\mathsf {Tr}\) may contain not only positive and negative examples but also instances with uncertain membership w.r.t. \(C\). According to a bagging approach, the training individuals are sampled with replacement in order to obtain \(n\) subsets \(\mathsf {D}_{i} \subseteq \mathsf Tr \), with \(i=1,\ldots , n\). In order to obtain \(D_i\)s, it is possible to apply various sampling strategies although, in this work, we followed the approach proposed in [5]. Firstly, the initial data distribution is considered by adopting a stratified sampling w.r.t. the class-membership values in order to represent instances of the minority class. In the second phase, undersampling can be performed on the training set in order to obtain (quasi-)balanced \(\mathsf D _i\) sets (i.e. with a class imbalance that will not affect much the training process). This means that if the majority class is the negative one, the exceeding part of the counterexamples is randomly discarded. In the dual case, positive instances are removed. In addition, the sampling procedure removes also all the uncertain instances. In Algorithm 1, the procedure that returns the sets \(D_i\) implementing this strategy is BalancedBootstrapSample. For each \(\mathsf {D}_{i}\), an ETDT \(T\) is built by means of a recursive strategy, as described in [13] which is implemented by the procedure induceETDT). It distinguishes various cases. The first one uses prior probability (estimate) to cope with the lack of examples (\(|\mathsf Ps |=0\) and \(|\mathsf Ns |=0\)). The second one sets the class label for a leaf node if it is sufficiently pure, i.e. no positive (resp. negative) example is found while most examples are negative (resp. positive). This purity condition is evaluated by considering the BBA \(m\) given as input for the algorithm (\(m(\{-1\} \simeq 0\) and \(m(\{+1\}) > \theta \), \(m(\{+1\} \simeq 0\) and \(m(\{-1\}) > \theta \)). The values of a BBA function for the membership values are obtained by computing the number of positive, negative and uncertain-membership instances w.r.t. the current concept. Finally, the third (recursive) case concerns the availability of both negative and positive examples. In this case, the current concept description \(D\) has to be specialized by means of an operator exploring the search space of downward refinements of \(D\). Following the approach described in [5, 8], the refinement step produces a set of candidate specializations \(\rho (D)\) and a subset of them, namely \(RS\), is then randomly selected (via function RandomSelection) by setting its cardinality according to the value returned by a function \(f\) applied to the cardinality of the set of specializations returned by the refinement operator (e.g. \(\sqrt{|\rho (D)|}\)). A BBA \(m'\) is then built for each candidate \(E \in RS\). Again, the function can be obtained by counting the number of positive, negative and uncertain-membership instances). Then the best pair \( \langle E^*, m^* \rangle \in S \) according to the non-specificity measure employed in [13] is determined by the selectBestCandidate procedure and finally installed in the current node. Specifically, the procedure tries to find the pair \(\langle E^*,m^*\rangle \) having the smallest non-specificity measure value. After the assessment of the best pair \( E^* \), the individuals are partitioned by the procedure split for the left or right branch according to the result of the instance-check w.r.t. \( E^* \), maintaining the same group (\( \mathsf P ^{l/r} \),\( \mathsf N ^{l/r} \), or \( \mathsf U ^{l/r} \)). Note that a training example \( a \) is replicated in both children in case both \(\mathcal {K}\not \models E^*(a)\) and \(\mathcal {K}\not \models \lnot E^*(a)\). The divide-and-conquer strategy is applied recursively until the instances routed to a node satisfy one of the stopping conditions discussed above.

figure b

Prediction. After an ETRF is produced, predictions can be made relying on the resulting classification model. The related procedure sketched in Algorithm 2 works as follows. Given the individual to be classified, for each tree \(T_i\) of the forest, the procedure classify returns a BBA assigned to the leaves reached from the root in a path down the tree. Specifically, the algorithm traverses recursively the ETDT by performing an instance check w.r.t. the concept contained in each node that is reached: let \(a \in \mathsf Ind (\mathcal {A})\) and \(D\) the concept installed in the current node, if \(\mathcal {K}\models D(a)\) (resp. \(\mathcal {K}\models \lnot D(a)\)) the left (resp. right) branch is followed. If neither \(\mathcal {K}\not \models D(a)\) nor \(\mathcal {K}\not \models \lnot D(a)\) is verified, both branches are followed. After the exploration of a single ETDT, the list \(L\) may contain several BBAs. In this case, BBAs are pooled according to a combination rule as the Dubois-Prade’s one [13]. The function classify returns the combined BBA according to this rule (denoted by the symbol \( \bigoplus \)). After polling all trees, a set of BBAs deriving from the previous phase are exploited to decide the class label to the test individual \(a\). Function classifyByTRF takes an individual \(a\) and a forest \(F\). Then, the algorithm iterates on the forest trees collecting the BBAs via function classify. Then, the BBAs are pooled according to a further combination rule, which can be different from the one employed during the exploration of a single ETDT. Additionally, this combination rule should be also an associative operator [15]. In this way, the result should not be affected by the pooling order of the BBAs. In our experiments we combined these BBAs via Dempster’s rules (denoted by the symbol \(\bigoplus \) in the function classifyByTRF). By using this rule, the disagreement between classifiers, which corresponds to the conflict exploited as normalization factor, is explicitly considered by the meta-learner. The final decision is then made according to the belief function value computed from the pooled BBAs \(\overline{m}\). In this case, we aim to select the \(l \in 2^\mathcal {L}\) which maximizes the value of the function. However, in order to cope with the monotonicity of belief function which can lead easily to return an unknown-membership as a final prediction, the meta-learner must compare the value for the positive and negative class label and it assign the unknown membership if their values are approximately equal. This is made by comparing the difference between belief function values w.r.t. a threshold \(\epsilon \).

4 Preliminary Experiments

The experimental evaluation aims at evaluating the effectiveness of the classification based on the ETRF modelsFootnote 1 and the improvement in terms of prediction w.r.t. TRFs. We provide the details of the experimental setup and present and discuss the outcomes.

4.1 Setup

Various Web ontologies have been considered in the experiments (see Table 1). They are available on TONES repositoryFootnote 2. For each ontology of TONES, 15 query concepts have been randomly generated by combining (using the conjunction and disjunction operators or universal and existential restriction) 2 through 8 (primitive or defined) concepts of the ontology.

Table 1. Ontologies employed in the experiments

As in previous works [5, 13], because of the limited population of the considered ontologies, all the individuals occurring in each ontology were employed as (training or test) examples.

A 10-fold cross validation design of the experiments was adopted so that the final results are averaged for each of the considered indices (see below). We compared our extensions with other tree-based classifiers: TDTs [6], TRFs [5] and ETDTs [13].

In order to learn each ETDTs by considering a balanced set of examples, a stratified sampling was required (see Sect. 3). Three stratified sampling rates related to the \( \mathsf D _i \)s were set in our experiments, namely 50 %, 70 % and 80 %.

Finally, forests with an increasing number of trees were induced, namely: 10, 20 and 30. For each tree in a forest, the number of randomly selected candidates was determined as the square root of candidate refinements: \(\sqrt{\mid \rho (\cdot )\mid }\). We employed these settings for training both ETRFs and TRFs. As in previous works [5, 6, 13], to compare the predictions made using RFs against the ground truth assessed by a reasoner, the following indices were computed:

  • match rate (M%), i.e. test individuals for which the inductive model and a reasoner agree on the membership (both \(\{+1\}\), \(\{-1\}\), or \(\{-1,+1\}\));

  • commission rate (C%) i.e. test cases where the determined memberships are opposite (i.e. \(\{+1\}\) vs. \(\{-1\}\) or viceversa);

  • omission rate (O%), i.e. test cases for which the inductive method cannot determine a definite membership while the reasoner can (\(\{-1,+1\}\) vs. \(\{+1\}\) or \(\{-1\}\));

  • induction rate (I%). i.e. test cases where the inductive method can predict a definite membership while the reasoner cannot assess it (\(\{+1\}\) or \(\{-1\}\) vs. \(\{-1,+1\}\)).

Table 2. Results of experiments with TDTs and ETDTs models
Table 3. Comparison between TRFs and ETRF with sampling rate of 50 %
Table 4. Comparison between TRFs and ETRF with sampling rate of 70 %
Table 5. Comparison between TRFs and ETRF with sampling rate of 80 %
Table 6. Differences between the results for TRFs and ETRFs model. The symbol \(\bullet \) is used to denote that a positive or negative difference that is in favor of ETRFs, while the symbol \(\circ \) is used to denote a positive or negative difference that is in favor of TRFs

4.2 Results

As regards the distribution of the instances w.r.t. the target concepts, we observed that negative instances outnumber the positive ones in BCO and Human Disease (HD). In the case of BCO this occurred for all concepts but one with a ratio between positive and negative instances of 1 : 20. In the case of HD this kind of imbalance occurred for all the queries. Moreover, in the case of HD the number of instances with an uncertain-membership is very large (about 90 %). On the other hand, in the case of NTN, we noted the predominance of positive instances: for most concepts the ratio between positive and negative instances was 12 : 1 and a lot of uncertain-membership instances were found (again, over 90 %). A weaker imbalance could be noted with BioPax. For most query concepts the ratio between positive and negative instances was 1 : 5. In addition, for most query concepts, uncertain-membership instances lacked. This kind of instances were available only for 2 queries. The class distribution was balanced for three concepts only.

Tables 2, 3, 4 and 5 report the results of this empirical evaluation. On the other hand, Table 6 shows the differences between indexes for TRFs and ETRFs. In general, we can observe how ensemble methods perform better or, in the worst cases, have the same performance of a single classifiers approach for most ontologies. For example, when we compare ETRFs w.r.t. ETDTs, a significant improvement was obtained for Biopax (the match rate was around 96 % for ETRFs and 87 % for ETDTs). For BCO, there was a more limited improvement: it was only around \(1.31\,\%\) and it was likely due to the number of examples available in BCO. In this case, when ETRFs model were induced, there was a larger overlap between the ETDTs in the forests and the sole ETDT model employed in the single-classifier approach, i.e. the models were very similar to each other.

As regards the comparison between ETRFs and TRFs model, an improvement of match rate and a subsequent decrease of induction rate was observed for Bco. This improvement was around \(6\,\%\) for match rate while it was of \(3\,\%\) for the induction rate when a sampling rate of \(50\,\%\) was employed. The improvement of match rate was larger when the sampling rate of 70 % and 80 % were employed. In this case, the addition of further instances lead to that the improvement of the predictiveness of the ETRFs. The ensemble of models proposed in this paper showed a more conservative behavior w.r.t. the original version. It can be noted that the increase of match rate was mainly due to uncertain-membership instances that were not classified as induction cases, as a result of the values of belief functions employed for making decisions. Another cause is related to the lack of omission cases. In this case, the procedure for forcing the answer leads to decide in favor of the correct class-membership value. Besides the value of commission rate did not change in a significant way. The proposed extension is also more stable in terms of standard deviation: for ETRFs, this value is lower than the one obtained for TRFs.

With BioPax, we observed again the increase of the match and a significant decrease of commission rate. Also the induction rate was larger with ETRFs than with TRFs, likely due to the procedure for forcing the answer. As regards the experiments on HD and NTN ontology, we can observe, differently from the original version of TRFs, how the induction rate was very high when ETRFs were employed. For the latter case, this result was mainly due to the original data distribution that showed an overwhelming of uncertain instances. As previously mentioned, they approximately represented about 50 % of the total number of instances in the ABox of HD and about 90 % for NTN. TRFs showed a conservative behavior by returning an unknown membership (due to uncertain results of the intermediate tests during the exploration of trees [5]) which tends to preserve the matches with the gold-standard membership also in case of uncertain membership. This explains the high match rate observed in the experiments. After the induction of ETRFs, the models showed a braver behavior also due to the forcing procedure. As a result, it tends to more easily assign a positive or negative membership to a test instance leading to the increase of the induction rate, with a value of about \( 89\,\% \) while omission cases missed. Induction cases represent new non-derivable knowledge that can be potentially useful for ontology completion, their larger number suggest that the result may be also due to the existing noise (also due to the employment of the entire ABox as dataset). This basically means that most induced assertions may be not definitely related to learned concepts, but they cannot considered as real errors like commission rate.

Similarly to our previous experiments proposed in [5], we observed also how the generated concept descriptions that were installed as node for each ETDT do not improve the quality of the splittings, similarly to the case of TDTs where the training was lead by the information gain criterion. This occurred for all the datasets that were considered here. In both cases, most instances were sent along a branch, while a small number of them were sent along the other one. This means that small disjuncts problem is a common problem both TRFs and ETRFs and neither the information gain nor the non-specificity measure can be considered as suitable measures for selecting the best concept description that is used to split instances during the training phase. A further remark concerns the predictiveness of the proposed method w.r.t. both the sampling methods and the number of trees in a forest. Also for ETRFs, the performance did not change significantly when a larger number of trees was set or when the algorithm resort to a larger stratified sampling rate. While in the former case the results are likely due to a weak diversification between ETDTs, in the latter case, the result was likely due to the availability of examples whose employment did not change the quality of splittings generated during the growth process. For ETRFs, similarly to TRF models, the refinement operator is still a bottleneck for learning phase: execution times spanned from few minutes to almost 10 h as the experiments proposed in [5]. However, when an intermediate test with an uncertain result was encountered, the exploration of alternative paths affected the efficiency of the proposed method.

5 Conclusion and Extensions

We have proposed an algorithm for inducing Evidential Terminological Random Forests, an extension of Terminological Random Forests devised to tackle the class-imbalance problem for learning predictive classification models for SW knowledge bases. As the original version, the algorithm combines a sampling approach with ensemble learning techniques. The resulting models combine predictions that are represented as basic belief functions rather than votes by exploiting combination rules in the context of the Dempster-Shafer Theory for making the final decision. In addition, a preliminary empirical evaluation with publicly available ontologies has been performed. The experiments have shown how the new classification model seems to be more predictive than the previous ones and it tends to assign a definite membership. Besides, the predictiveness of the model can be sufficiently tolerant to variation of the number of trees and the sampling rate. The standard deviation is also lower than the original TRFs. In the future, we plan to extend the method along various directions. One regards the choice of the refinement operator that may be applied in order to generate more discriminative intermediate tests. This plays a crucial role for the quality of the classifiers involved in the ensemble model in order to obtain quite predictive weak learners from both expressive and shallow ontologies extracted from the Linked Data cloud [20]. In order to cope with the latter case, the method could be parallelized in order to employ it as a non-standard tool to reason over such datasets. Further ensemble techniques and novel rules for combining the answers of the weak learners could be employed. For example, weak learners can be induced from subsets of training instances generated by means of a procedure based on cross-validation rather than sampling with replacement. Finally, further investigations may concern the application of strategies aiming to optimize the ensemble, that is an important characteristic of such learning methods [12, 21].