Abstract
This paper proposes a method for extracting the hidden characteristics of machine learning domains. It does so by evaluating the performance of various classifiers on these domains as well as on artificial data whose characteristics are visible since they were purposely included in the generation process. The results obtained on both the real and artificial data are analyzed simultaneously using a classical visualization tool for hierarchical clustering called a dendogram. The idea is to map the real-world domains to the artificial ones according to how well they are learnt by a variety of classifiers and, through this relationship, extract their characteristics. The method is able to determine how difficult it is to classify a specific domain and whether this difficulty stems from the complexity of the concept it embodies, the amount of overlap between each class, the dearth of training data or its dimensionality. This is an important contribution as it allows researchers to understand the underlying nature of their data, and, thus converge quickly toward novel, well-adapted solutions to their particular problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Generalizing from case studies: A case study. In: ICML 1992, pp. 1–10 (1992)
Alaiz-Rodriguez, R., Japkowicz, N., Tischer, P.: Visualizing Classifier Performance. In: ICTAI 2008 (2008)
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: Balancing Strategies and Class Overlapping. Intelligent Data Analysis, 24–35 (2005)
Data mining in metric space: An empirical analysis of supervised learning performance criteria. In: KDD 2004 (2004)
Multidimensional Scaling. Chapman and Hall (1994)
UCI Machine Learning Repository
Japkowicz, N., Stephen, S.: The Class Imbalance Problem: A Systematic Study. Intelligent Data Analysis 6(5), 429–450 (2002)
Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Computing Surveys 41(1), article 6 (2008)
Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufman (January 2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Japkowicz, N. (2012). Mining the Hidden Structure of Inductive Learning Data Sets. In: Kosseim, L., Inkpen, D. (eds) Advances in Artificial Intelligence. Canadian AI 2012. Lecture Notes in Computer Science(), vol 7310. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30353-1_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-30353-1_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30352-4
Online ISBN: 978-3-642-30353-1
eBook Packages: Computer ScienceComputer Science (R0)