Enhancing White-Box Machine Learning Processes by Incorporating Semantic Background Knowledge

Vandewiele, Gilles

doi:10.1007/978-3-319-58451-5_21

Gilles Vandewiele¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10250))

Included in the following conference series:

European Semantic Web Conference

1482 Accesses
2 Citations

Abstract

Currently, most of white-box machine learning techniques are purely data-driven and ignore prior background and expert knowledge. A lot of this knowledge has already been captured in domain models, i.e. ontologies, using Semantic Web technologies. The goal of this research proposal is to enhance the predictive performance and required training time of white-box models by incorporating the vast amount of available knowledge in the pre-processing, feature extraction and selection phase of a machine learning process.

You have full access to this open access chapter, Download conference paper PDF

A Literature Survey: Semantic Technology Approach in Machine Learning

Ontology and Machine Learning: A Two-Way Street to Improved Knowledge Representation and Algorithm Accuracy

Automated Machine Learning: Techniques and Frameworks

Keywords

1 Introduction

Most machine learning techniques are data-driven and thus ignore most of the vast amounts of existing available knowledge already captured in domain models [1], such as SNOMED [2], SSN [3] and UMLS [4]. The advantage of applying a purely data-driven approach is that the model is robust to outliers and noise. The disadvantage is that a computationally expensive training phase needs to be executed and that valuable prior knowledge is not taken into account. In many critical domains such as electronic health care and law enforcement, wherein wrong decisions made can have significant repercussions, knowledge-based systems such as expert systems were long preferred [5] as they can easily give a comprehensible corresponding explanation with their predictions. Moreover, they can be deployed without requiring a lot of data, which was rather hard to collect prior to the big data era. The main disadvantage of a purely knowledge-based approach is that the performance is completely biased to the content of the knowledge base [6], which can take a lot of time to construct and maintain, and that it is not able to learn new patterns or insights. Moreover, this approach is often not robust, e.g. in the case of conflicting rules or samples that do not comply to any of the defined rules.

Within the data-driven approaches, two large families of techniques can be distinguished. First, there are black-box techniques, such as artificial neural networks, which are often able to learn features automatically, thus not requiring a feature extraction and selection phase, and tend to achieve high predictive performances [7]. However, they cannot provide an explanation for their predictions, making them impractical in applications where decision support, instead of decision making, is crucial. Secondly, white-box techniques, such as decision tree induction and classification rule mining, construct an easily comprehensible predictive model from the data. While the predictive performance of these technique tends to be lower than their counterpart, they are able to give a corresponding explanation, therefore being ideally suited to provide decision support for experts within critical domains.

Given the advantages of both data-driven and knowledge-based approaches, advancements within the machine learning domain, the growth of data within all domains [8] and the vast amount of prior knowledge already available on the Semantic Web, a hybrid approach seems to be ideal. In such an approach, a white-box predictive model, such as a decision tree or an ordered rule list, is constructed from the given data with incorporation of prior knowledge in each of its steps. Ideally, the advantages of both approaches would be retained, i.e. robustness to outliers and noise, ability to give a corresponding explanation, a less expensive and more performant training phase and the ability to deduce new insights and knowledge.

The remainder of this paper is as follows. A use case which will be used as a running example throughout the rest of this paper is presented in Sect. 2, followed by a discussion of the related work in Sect. 3. A problem statement with corresponding hypotheses and research questions are presented in Sect. 4. A methodology to provide an answer on these research questions in proposed in Sect. 5. Then, we discuss how our future research will be evaluated in Sect. 6 and finally, a conclusion is given in Sect. 7.

2 Use Case: Primary Headache Diagnosis

Primary headaches [9] are an increasingly common health issue in modern society, having a large prejudicial impact. In Europe, it has a prevalence of more than 50% and according to the World Health Organization (WHO), severe headache attacks are one of the top 10 most disabling conditions [10]. Currently, it costs a lot of time to diagnose a patient correctly because a lot of different aspects need to be taken into account and because many different types of primary headache exist. Furthermore, a lot of research by medical experts has already been done in the headache domain, resulting in a vast amount of available prior domain and expert knowledge [11]. Therefore, the automatic diagnosis of primary headaches seems an ideal use case to combine both the data-driven and knowledge-driven approach which can have a very positive impact. For my master dissertation, a mobile headache journal^{Footnote 1} was developed that allows headache patients to register their headache attacks and medicine consumptions. The semantically annotated data generated by this mobile application, in combination with background knowledge [4], can be used to generate a decision tree in order to support an expert in making a correct diagnosis. An overview of this work-flow can be found in Fig. 1. This use case will be used as a running example throughout this paper and, in addition to well-known benchmark datasets, to evaluate the different proposed techniques.

3 Related Work

Combining the advantages of knowledge-driven and data-driven approaches, sometimes referred to as semantic data mining, has been investigated before. Two very thorough and recent surveys can be found in [12, 13]. A traditional white-box data-driven approach consists of several main steps, which can be identified in Fig. 2. In a first step, numerical features that have a high discriminative power are extracted from the raw data, which is optionally pre-processed first. Pre-processing examples include applying transformations to the data or generating and removing samples to balance the dataset. When all features are extracted, a selection phase is applied in order to discard the uninformative features, which allows for better generalization. Finally, a white-box model is constructed from the selected features. In the following subsections, related work for each of these phases is presented.

3.1 Automatic Feature Discovery

In a typical machine learning work-flow, a very large amount of time is spent on data cleaning and feature extraction. Generic features, which can be applied in a large number of problems, are available, but often, the most efficient features require some prior knowledge about the task to solve. Facilitating this feature extraction process by exploiting the concept of linked data to automatically discover new informative features could therefore significantly reduce the time required to create a predictive model. In order to do this, entities in the training set are mapped to a URI which corresponds to a node in the graph of linked data. From here, we can traverse edges to discover new features [14,15,16]. While this is a very interesting approach, there are many possible optimizations left, such as automatic measurements of feature importance, heuristics to decide when to stop traversing the immensely large graph and pruning parts of the graph in order to reduce the gigantic search space.

3.2 Class Balancing

In the classification domain, a dataset is called imbalanced when the distribution of the classes in the training set is skewed. An imbalanced dataset is very common in the financial and medical domain, e.g. fraud and epilepsy detection respectively. Class imbalance gives rise to a few potential problems. First, the classifier will be biased towards the largest populated class as this has the highest impact on the objective function it is trying to optimize, while this is often the class of least importance to the expert. Second, general metrics, such as accuracy, to evaluate the model give a wrong representation of the predictive performance [17, 18]. Two large approaches to tackle with data imbalance can be identified. On the one hand sampling techniques can remove or create new samples in order to make the distribution of the classes more uniform [19]. On the other hand, the classification algorithm can be modified (e.g. adapting the objective function) to pay more attention to samples in the minor class [20, 21]. Sampling techniques are very interesting, as they can be applied as a pre-processing step of the machine learning work-flow, and can therefore be seen as model-agnostic. Sampling techniques can be divided in either oversampling, where the number of samples in the minor class in increased, or undersampling, where the number of samples in the major class is decreased. In current state-of-the-art oversampling algorithms, such as SMOTE [22] and ADASYN [23], virtual samples of the minority class are generated by using the small amount of data available and thus no prior knowledge is used. On the other hand, researchers have already attempted to generate ‘virtual’ samples solely based on the prior knowledge available [24,25,26,27,28]. While the latter research attempts were not done in the context of imbalanced dataset but more in the context of data augmentation, a hybrid approach, which combines the positive characteristics of both approaches, can be very interesting.

3.3 Feature Selection

When all of the possible features are extracted from the raw data, a selection phase can optionally be applied in order to remove uninformative features. This can mitigate the curse of dimensionality and thus possibly increases the generalization capability of the model while reducing the amount of training time required. The research field dealing with incorporating prior knowledge into the selection phase is still very young and pre-mature. In [29], the Semantic Sensor Network (SSN) ontology [3] is adapted to allow for automatic feature selection. Here, features are selected based on dependency relations defined by an expert between predictor variables or between a predictor variable and the target variable. This technique has a lower computational complexity than current feature selection techniques, as it is dependent only on the number of features and not on the number of data samples, which can become very large in many cases. Moreover, in contrast to dimensionality reduction techniques such as t-SNE [30] and PCA [31], interpretability of the features is maintained and the selection phase only has to be re-applied when new features are added to the model, instead of when a certain amount of new samples is added. Unfortunately, this technique is still rather simplistic and is equivalent to manual feature selection.

4 Problem Statement

By analysis of the state of the art, one open problem can be identified:

P1. :: Current white-box machine learning techniques learn from scratch and often only use a limited amount of information (i.e. the training set) as they do not make full use of the vast amount of prior background and expert knowledge available in ontologies and on the web of linked data [32].

To solve this problem, different research questions need to be resolved first:

Q1. :: Can we improve existing or develop new techniques that map the entities in the dataset to a URI identifiable on the web of linked data in order to traverse the graph of data to extract new relevant, discriminant features for the task to solve?
Q2. :: Can we develop a hybrid technique that uses both the limited amount of samples in the minority class and the knowledge about the minority class in order to generate new samples to balance the dataset? Moreover, how does this hybrid technique compare to the techniques where only one of the two is used?
Q3. :: Is it possible to improve the feature selection phase by creating a new algorithm that ranks the different features based on their relations defined by an expert?

Finally, the following hypotheses can be deduced:

H1. :: The automatic discovery of new features by exploiting the concept of linked data can lead to a reduction in the labor needed for feature extraction while resulting in an increase in the predictive performance of the model.
H2. :: Balancing the dataset using both knowledge and the limited amount of samples in the minority class will result in a better predictive performance for the minority class than sampling methods that are based only on this limited amount of samples.
H3. :: Applying feature selection based on a ranked list of features, generated by applying a ranking algorithm on a graph of features defined by an expert, will require less time than current feature selection techniques and result in a better generalization capability. Moreover, it allows for experts to have more control of the algorithm, which can increase their will to adopt such a system.

5 Methodology

5.1 Automatic Feature Discovery

In order to augment the data with information from the web of linked data, a mapping phase must first be applied. Here, the entities in the initial dataset are mapped to a URI identifiable on the web of linked data or on a semantically annotated electronic health record in the medical domain. This mapping has to occur with minimal user interaction. When each of the samples are mapped on a URI, we can try to find new features by doing a breadth-first search in the graph of linked data. The reason for a breadth-first strategy is because of the almost infinite depth of the graph. In order for a new candidate feature to be informative, not too many missing values may occur and there must be correlation with the target variable (or must improve the cluster quality in the unsupervised case). Since counting the number of missing values and calculating correlations between a new candidate predictor and the target variable for a large dataset can take a significant amount of time, a subset of the initial dataset can be used to provide an approximation. Moreover, to decide heuristically which feature-threshold combination results in the most optimal split of data from all possible candidates, the Hoeffding bound [33, 34] can be applied. Since the graph we are traversing has an immense size, we need to define conditions when to stop the search, e.g. stop when we traversed k levels deeper in the graph without finding a new usable feature. Finally, pruning of the graph can optionally be applied by calculating semantic concept relatedness [35, 36] between a new subject and the target concept. When there is almost no semantic relation between a new concept and the target concept, that part of the graph can already be pruned. Many different metrics exist to calculate this relatedness [37, 38]. I will perform a clear evaluation of different metrics in order to find the most suited one for this task.

For the headache use case, a user profile in the mobile journal needs to be mapped to the patient’s semantically annotated electronic health record. This can be done by joining on unique identifiable information such as the combination of name and email. As the electronic health record is semantically annotated, it can be seen as a graph, which can be traversed to discover new informative features that help in formulating a correct diagnosis for a primary headache patient. Moreover, datasets ideally suited for evaluation of this technique exist. Examples include the zoo dataset from UCI [39] and the datasets curated by the University of Mannheim [40]. The property of these datasets is that they contain a limited amount of information about rich concepts (such as cities or animals), and therefore rely on automatic feature discovery to obtain reasonable results.

5.2 Class Balancing

In order to balance the classes, oversampling as a pre-processing step will be investigated, enabling a model-agnostic approach. I will create a hybrid approach that combines the positive characteristics of data-based sampling algorithms, such as SMOTE [22] and ADASYN [23] and knowledge-based sampling algorithms, where samples are generated that comply to a pre-defined knowledge base. First, consistency of the knowledge base or the given data needs to be checked by evaluating whether the small amount of samples in the minority class complies to this knowledge. If this is not the case, there is either an anomaly in the data or a inconsistency/fault in the knowledge base that needs to be resolved. When we find that a certain fraction of the samples in the minority class do not comply to one specific rule in the knowledge base, chance are high that the rule is inconsistent with the ground truth and we can remove the rule. Else, the sample is probably an anomaly and can therefore be removed. Alternatively, both the rule and the sample can be removed. An evaluation is required to determine which technique (and threshold on the fractions of samples) is most suited for a dataset with certain properties. After this phase, data can be generated based on the knowledge base and on the small amount of samples in our dataset. For each dimension (i.e. feature) for which knowledge is available, these dimensions of a new virtual sample are set to values that comply to this defined knowledge (e.g. the value must be in a certain range). Of course, it is infeasible to have complete information about each dimension. For these dimensions, the values of samples in our dataset can be used as follows: we find the two nearest neighbors to our new virtual sample in the feature space defined by the features of which knowledge is available; then we can generate a random point on the link between these two neighbors.

One of the most severe primary headache types is cluster headache. It has been discovered quite recently and is a rather rare condition, with a prevalence of 1 out of 1000 [41] as opposed to 1 out of 7 for migraine [42], making it very hard to diagnose. This, in combination with the fact that a lot of domain knowledge is available [11], makes it an ideal use case to evaluate the new technique on.

5.3 Feature Selection

I will design a method that allows to represent the knowledge base as a graph, where each feature defined in the knowledge base or dataset corresponds to a node, and each relation between two features (such as dependsOn or independentOf) corresponds to an edge between their two corresponding nodes. I will then rehone a ranking algorithm, similar to e.g. Google PageRank [43], to calculate a weight for each of the nodes (or features) in the graph [44,45,46]. Finally, we can sort the features on their rank and return the top k features [47]. An example is given in Fig. 3.

For the headache use case, the newly discovered features (see Subsect. 5.1) and their corresponding descriptions, in combination with the features obtained from the semantically annotated information produced by the mobile headache journal, can be visualized for a neurologist in a GUI. The neurologist can then define relations between these features, analogue to Fig. 3. Finally, the ranking algorithm can be applied to create a list of features, ordered by their importance. This technique can easily be compared to other feature ranking techniques by taking the k top ranked features of both approaches and measuring the predictive performance of the model, trained on these features.

6 Evaluation

To evaluate the impact of prior knowledge incorporation in each of the phases, a comparison will be done between the process with and without incorporation regarding the following criteria (sorted by decreasing priority):

predictive performance of the model: by calculating the accuracy, balanced accuracy, precision, recall, AUC, F-measure, etc.
predictive model complexity: by visual inspection and counting the maximal depth, number of nodes or leaves in the resulting decision tree
computational time: by timing the execution of each of the phases in the machine learning process

The evaluation will be done for both incorporation in each phase separately and incorporation in all (possible subsets) of the phases. To take the no-free-lunch theorem [48] into account, the evaluation will be done on multiple benchmark datasets with varying characteristics.

7 Conclusion

In this research proposal, related work and methodologies are presented to incorporate prior background and expert knowledge, represented using Semantic Web technologies, into the first phases of a white-box machine learning approach: data balancing and feature extraction & selection. We are convinced that the incorporation of prior knowledge into these phases will allow for higher predictive performances and reduced training times. An evaluation regarding computational time, model complexity and predictive performance will be done by comparing the process with and without incorporation on multiple benchmark datasets and a real-world use cases.

Notes

1.
https://play.google.com/store/apps/details?id=be.ugent.chronicals&hl=en.

References

Jan, T., Debenham, J.: Incorporating prior domain knowledge into inductive machine learning. J. Mach. Learn., 1–42 (2007)
Google Scholar
Schulz, S., et al.: Snomed reaching its adolescence: ontologists and logicians health check. Int. J. Med. Inform. 78, S86–S94 (2009)
Article Google Scholar
Compton, M., et al.: The SSN ontology of the W3C semantic sensor network incubator group. Web Seman. Sci. Serv. Agents WWW 17, 25–32 (2012)
Article Google Scholar
Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(Suppl 1), D267–D270 (2004)
Article Google Scholar
Kattan, M.W.: Expert systems in medicine. Elsevier Ltd. (2001)
Chapter Google Scholar
Tresp, V., Bundschus, M., Rettinger, A., Huang, Y.: Towards machine learning on the semantic web. In: Costa, P.C.G., d’Amato, C., Fanizzi, N., Laskey, K.B., Laskey, K.J., Lukasiewicz, T., Nickles, M., Pool, M. (eds.) URSW 2005-2007. LNCS (LNAI), vol. 5327, pp. 282–314. Springer, Heidelberg (2008). doi:10.1007/978-3-540-89765-1_17
Chapter Google Scholar
Lim, T.S., et al.: Comparison of prediction accuracy, complexity, and training time of thirty-three classification algorithms. Mach. Learn. 40, 203–228 (2000)
Article Google Scholar
Wu, X., Zhu, X., Wu, G.-Q., Ding, W.: Data mining with big data. IEEE Trans. Knowl. Data Eng. 26(1), 97–107 (2014)
Article Google Scholar
Caemaert, J., Baert, E.J.A.: Neurologie. Springer (2003)
Google Scholar
Stovner, L.J., Zwart, J.-A., Hagen, K., Terwindt, G.M., Pascual, J.: Epidemiology of headache in Europe. Eur. J. Neurol. 13(4), 333–345 (2006)
Article Google Scholar
Levin, M.: The international classification of headache disorders. Headache J. Head Face Pain 53(8), 1383–1395 (2013)
Article Google Scholar
Dou, D., Wang, H., Liu, H.: Semantic data mining: a survey of ontology-based approaches. In: 2015 IEEE 9th International Conference on Semantic Computing (ICSC), pp. 244–251 (2015)
Google Scholar
Ristoski, P., Paulheim, H.: Semantic web in data mining and knowledge discovery: a comprehensive survey. Web Seman. Sci. Serv. Agents World Wide Web 36, 1–22 (2016)
Article Google Scholar
Nickel, M., et al.: A review of relational machine learning for knowledge graphs from multi-relational link prediction to automated knowledge graph construction. Proc. IEEE, 1–18 (2015)
Google Scholar
Paulheim, H., Ristoski, P., Mitichkin, E., Bizer, C.: Data mining with background knowledge from the web. In: RapidMiner World (2014)
Google Scholar
Ristoski, P.: Towards linked open data enabled data mining. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 772–782. Springer, Cham (2015). doi:10.1007/978-3-319-18818-8_50
Chapter Google Scholar
Longadge, R., Dongre, S.: Class imbalance problem in data mining review. arXiv preprint arXiv:1305.1707 (2013)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, New York (2005)
Chapter Google Scholar
Ganganwar, V.: An overview of classification algorithms for imbalanced datasets. IJETAE 2(4), 42–47 (2012)
Google Scholar
Tang, Y., Zhang, Y.-Q., Chawla, N.V., Krasser, S.: Svms modeling for highly imbalanced classification. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(1), 281–288 (2009)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Philip Kegelmeyer, W.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
He, H., et al.: Adasyn: adaptive synthetic sampling approach for imbalanced learning. In: IJCNN, pp. 1322–1328. IEEE (2008)
Google Scholar
Niyogi, P., Girosi, F., Poggio, T.: Incorporating prior information in machine learning by creating virtual examples. Proc. IEEE 86(11), 2196–2209 (1998)
Article Google Scholar
Iqbal, R.A.: A generalized method for integrating rule-based knowledge into inductive methods through virtual sample creation. arXiv:1101.4924 (2011)
Yang, J., et al.: A novel virtual sample generation method based on Gaussian distribution. Know.-Based Syst. 24(6), 740–748 (2011)
Article Google Scholar
Lin, L.-S., et al.: Improving virtual sample generation for small sample learning with dependent attributes. In: 2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), pp. 715–718 (2016)
Google Scholar
Li, D.-C., Wen, I.-H.: A genetic algorithm-based virtual sample generation technique to improve small data set learning. Neurocomputing 143, 222–230 (2014)
Article Google Scholar
Ringsquandl, M., Lamparter, S., Brandt, S., Hubauer, T., Lepratti, R.: Semantic-guided feature selection for industrial automation systems. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 225–240. Springer, Cham (2015). doi:10.1007/978-3-319-25010-6_13
Chapter Google Scholar
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
MATH Google Scholar
Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemometr. Intell. Lab. Syst. 2(1–3), 37–52 (1987)
Article Google Scholar
Gülçehre, Ç., Bengio, Y.: Knowledge matters: importance of prior information for optimization. J. Mach. Learn. Res. 17(8), 1–32 (2016)
MathSciNet MATH Google Scholar
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM (2000)
Google Scholar
Terziev, Y.: Feature generation using ontologies during induction of decision trees on linked data. In: ISWC PhD Symposium (2016)
Google Scholar
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: IJCAI, vol. 7, pp. 1606–1611 (2007)
Google Scholar
Bonte, P., Ongenae, F., De Turck, F.: Learning semantic rules for intelligent transport scheduling in hospitals. In: CEUR Workshop Proceedings, vol. 1586, pp. 1–6 (2016)
Google Scholar
Hassan, S., Mihalcea, R.: Semantic relatedness using salient semantic analysis. In: AAAI (2011)
Google Scholar
Gurevych, I.: Using the structure of a conceptual network in computing semantic relatedness. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 767–778. Springer, Heidelberg (2005). doi:10.1007/11562214_67
Chapter Google Scholar
Lichman, M.: UCI machine learning repository (2013)
Google Scholar
Ristoski, P., de Vries, G.K.D., Paulheim, H.: A collection of benchmark datasets for systematic evaluations of machine learning on the semantic web. In: Groth, P., Simperl, E., Gray, A., Sabou, M., Krötzsch, M., Lecue, F., Flöck, F., Gil, Y. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 186–194. Springer, Cham (2016). doi:10.1007/978-3-319-46547-0_20
Chapter Google Scholar
Fischera, M., et al.: The incidence and prevalence of cluster headache: a meta-analysis of population-based studies. Cephalalgia 28(6), 614–618 (2008)
Article Google Scholar
Burch, R.C., Loder, S., Loder, E., Smitherman, T.A.: The prevalence and burden of migraine and severe headache in the united states: updated statistics from government health surveillance studies. Headache J. Head Face Pain 55(1), 21–34 (2015)
Article Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Technical report, Google (1999)
Google Scholar
Thalhammer, A., Rettinger, A.: PageRank on wikipedia: towards general importance scores for entities. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 227–240. Springer, Cham (2016). doi:10.1007/978-3-319-47602-5_41
Chapter Google Scholar
Wade, A.D., et al.: Wsdm cup 2016: entity ranking challenge. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, pp. 593–594. ACM (2016)
Google Scholar
Lee, S., et al.: Random walk based entity ranking on graph for multidimensional recommendation. In: Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys 2011, pp. 93–100. ACM, New York (2011)
Google Scholar
Ienco, D., Meo, R., Botta, M.: Using pagerank in feature selection. In: SEBD, pp. 93–100 (2008)
Google Scholar
Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1(1), 67–82 (1997)
Article Google Scholar

Download references

Acknowledgements

I would like to thank my promoters prof. Filip De Turck & dr. Femke Ongenae from Ghent University and my mentor, prof. Agnieszka Ławrynowicz from Poznan University, for their support and valuable input in the realization of this work. This research is funded by a PhD SB fellow scholarship of FWO (1S31417N).

Author information

Authors and Affiliations

Department of Information Technology, Ghent University - imec, IDLab, Ghent, Belgium
Gilles Vandewiele

Authors

Gilles Vandewiele
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gilles Vandewiele .

Editor information

Editors and Affiliations

Linköping University, Linköping, Sweden
Eva Blomqvist
University of Sheffield, Sheffield, United Kingdom
Diana Maynard
Paris Nord University, Paris, France
Aldo Gangemi
Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
Rinke Hoekstra
Wright State University, DAYTON, Ohio, USA
Pascal Hitzler
Linköping University, Linköping, Sweden
Olaf Hartig

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vandewiele, G. (2017). Enhancing White-Box Machine Learning Processes by Incorporating Semantic Background Knowledge. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds) The Semantic Web. ESWC 2017. Lecture Notes in Computer Science(), vol 10250. Springer, Cham. https://doi.org/10.1007/978-3-319-58451-5_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-58451-5_21
Published: 07 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58450-8
Online ISBN: 978-3-319-58451-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing White-Box Machine Learning Processes by Incorporating Semantic Background Knowledge

Abstract

Similar content being viewed by others

A Literature Survey: Semantic Technology Approach in Machine Learning

Ontology and Machine Learning: A Two-Way Street to Improved Knowledge Representation and Algorithm Accuracy

Automated Machine Learning: Techniques and Frameworks

Keywords

1 Introduction

2 Use Case: Primary Headache Diagnosis

3 Related Work

3.1 Automatic Feature Discovery

3.2 Class Balancing

3.3 Feature Selection

4 Problem Statement

5 Methodology

5.1 Automatic Feature Discovery

5.2 Class Balancing

5.3 Feature Selection

6 Evaluation

7 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Enhancing White-Box Machine Learning Processes by Incorporating Semantic Background Knowledge

Abstract

Similar content being viewed by others

A Literature Survey: Semantic Technology Approach in Machine Learning

Ontology and Machine Learning: A Two-Way Street to Improved Knowledge Representation and Algorithm Accuracy

Automated Machine Learning: Techniques and Frameworks

Keywords

1 Introduction

2 Use Case: Primary Headache Diagnosis

3 Related Work

3.1 Automatic Feature Discovery

3.2 Class Balancing

3.3 Feature Selection

4 Problem Statement

5 Methodology

5.1 Automatic Feature Discovery

5.2 Class Balancing

5.3 Feature Selection

6 Evaluation

7 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation