Mining the Hidden Structure of Inductive Learning Data Sets

Japkowicz, Nathalie

doi:10.1007/978-3-642-30353-1_30

Nathalie Japkowicz²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7310))

Included in the following conference series:

Canadian Conference on Artificial Intelligence

1815 Accesses

Abstract

This paper proposes a method for extracting the hidden characteristics of machine learning domains. It does so by evaluating the performance of various classifiers on these domains as well as on artificial data whose characteristics are visible since they were purposely included in the generation process. The results obtained on both the real and artificial data are analyzed simultaneously using a classical visualization tool for hierarchical clustering called a dendogram. The idea is to map the real-world domains to the artificial ones according to how well they are learnt by a variety of classifiers and, through this relationship, extract their characteristics. The method is able to determine how difficult it is to classify a specific domain and whether this difficulty stems from the complexity of the concept it embodies, the amount of overlap between each class, the dearth of training data or its dimensionality. This is an important contribution as it allows researchers to understand the underlying nature of their data, and, thus converge quickly toward novel, well-adapted solutions to their particular problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Generalizing from case studies: A case study. In: ICML 1992, pp. 1–10 (1992)
Google Scholar
Alaiz-Rodriguez, R., Japkowicz, N., Tischer, P.: Visualizing Classifier Performance. In: ICTAI 2008 (2008)
Google Scholar
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: Balancing Strategies and Class Overlapping. Intelligent Data Analysis, 24–35 (2005)
Google Scholar
Data mining in metric space: An empirical analysis of supervised learning performance criteria. In: KDD 2004 (2004)
Google Scholar
Multidimensional Scaling. Chapman and Hall (1994)
Google Scholar
UCI Machine Learning Repository
Google Scholar
Japkowicz, N., Stephen, S.: The Class Imbalance Problem: A Systematic Study. Intelligent Data Analysis 6(5), 429–450 (2002)
MATH Google Scholar
Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Computing Surveys 41(1), article 6 (2008)
Google Scholar
Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufman (January 2011)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Ontario, Canada
Nathalie Japkowicz

Authors

Nathalie Japkowicz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Engineering and Computer Science, Department of Computer Science and Software Engineering, Concordia University, H3G 1M8, Montreal, QC, Canada
Leila Kosseim
Faculty of Engineering, School of Electrical Engineering and Computer Science, University of Ottawa, K1N 6N5, Ottawa, ON, Canada
Diana Inkpen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Japkowicz, N. (2012). Mining the Hidden Structure of Inductive Learning Data Sets. In: Kosseim, L., Inkpen, D. (eds) Advances in Artificial Intelligence. Canadian AI 2012. Lecture Notes in Computer Science(), vol 7310. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30353-1_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-30353-1_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30352-4
Online ISBN: 978-3-642-30353-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics