Abstract
Although preprocessing is one of the key issues in data analysis, it is still common practice to address this task by manually entering SQL statements and using a variety of stand-alone tools. The results are not properly documented and hardly re-usable. The MiningMart system presented in this chapter focuses on setting up and re-using best practice cases of preprocessing data stored in very large databases. A metadata model named M4 is used to declaratively define and document both, all steps of such a preprocessing chain and all the data involved. For data and applied operators there is an abstract level, understandable by human users, and an executable level, used by the metadata compiler to run cases for given data sets. An integrated environment allows for rapid development of preprocessing chains. Adaptation to different environments is supported simply by specifying all involved database entities in the target DBMS. This allows reuse of best practice cases published on the Internet.
Keywords
- Data Mining Algorithm
- Business Level
- Output Concept
- Conceptual Data Model
- Input Concept
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options
Preview
Unable to display preview. Download preview PDF.
References
A. Bernstein, S. Hill, F. Provost: An Intelligent Assistant for the Knowledge Discovery Process. Technical Report IS02–02, New York University, Leonard Stern School of Business (2002)
P. Brazdil: Data Transformation and Model Selection by Experimentation and Meta-Learning. In: C.G. Carrier, M. Hilario (eds.), Workshop Notes-Upgrading Learning to the Meta-Level: Model Selection and Data Transformation (Technical University Chemnitz, April 1998), number CSR-98–02 in Technical Report, pp. 11–17
K. Causse, M. Csernel, K. Morik, C. Rouveirol: MLT Deliverable 2.2: Specification of the Common Knowledge Representation Language of the MLToolbox. GMD (German Natl. Research Center for Computer Science, P.O.Box 1240, W-5205 St. Augustin 1, Germany, September 1990 )
R. Engels: Planning Tasks for Knowledge Discovery in Databases; Performing Task-Oriented User-Guidance. In: Proc. of th 2nd Int. Conf. on Knowledge Discovery in Databases, August 1996
R. Engels, G. Lindner, R. Studer: A Guided Tour through the Data Mining Jungle. In: Proceedings of the 3rd International Conference on Knowledge Discovery in Databases (KDD-97) pp. 14–17 (August, 1997 )
S. Fischer, R. Klinkenberg, I. Mierswa, O. Ritthoff: Yale: Yet Another Learning Environment-Tutorial. Technical Report CI-136/02, Collaborative Research Center 531, University of Dortmund, Dortmund, Germany, 2002. ISSN 14333325
J.U. Kietz, R. Züecker, A. Fiammengo, G. Beccari: Data Sets, Metadata and Preprocessing Operators at Swiss Life and CSELT. Deliverable D6.2, IST Project MiningMart, IST-11993 (2000)
J.U. Kietz, R. Züecker, A. Vaduva: Mining Mart: Combining Case-BasedReasoning and Multi-Strategy Learning into a Framework to reuse KDDApplication. In: R.S. Michalski, P. Brazdil (eds.), Proceedings of the fifth International Workshop on Multistrategy Learning (MSL2000) ( Guimares, Portugal, May 2000 )
H. Liu, H. Motoda: Feature Selection for Knowledge Discovery and Data Mining (Kluwer Academic Publishers, 1998 )
D. Michie, D.J. Spiegelhalter, C.C. Taylor: Machine Learning, Neural and Statistical Classification ( Ellis Horwood, New York u.a., 1994 )
K. Morik, K. Causse, R. Boswell: A Common Knowledge Representation Integrating Learning Tools. In: Proc. of the 1st International Workshop on Multistrategy Learning (Harpers Ferry, 1991 )
S. Rüeping: Zeitreihenprognose für Warenwirtschaftssysteme unter Berücksichtigung asymmetrischer Kostenfunktionen. Master’s thesis, Universität Dortmund (1999)
D. Sleeman, R. Oehlman, R. Davidge: Specification of Consultant-0 and a Comparision of Several Learning Algorithms Deliverable D5.1, Esprit Project, pp. 2154 (1989)
C. Theusinger, G. Lindner: Benutzerunterstützung eines KDD-Prozesses anhand von Datencharakteristiken. In: F. Wysotzki, P. Geibel, K. Schädler (eds.), Beiträge zum Treffen der GI-Fachgruppe 1.1.3 Machinelles Lernen (FGML-98) (Technical University Berlin, 1998) volume 98/11 of Technical Report
I. Witten, E. Frank: Data Mining-Practical Machine Learning Tools and Techniques with JAVA Implementations (Morgan Kaufmann, 2000 )
N. Zhong, C. Liu, S. Ohsuga: A Way of Increasing both Autonomy and Versatility of a KDD System. In: Z.W. Ras, A. Skowron (eds.), Foundations of Intelligent Systems (Springer, 1997 ) pp. 94–105
N. Zhong, C. Liu, S. Ohsuga: Dynamically Organizing KDD Processes. International Journal of Pattern Recognition and Artificial Intelligence, 15 (3), 451–473 (2001)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Morik, K., Scholz, M. (2004). The MiningMart Approach to Knowledge Discovery in Databases. In: Intelligent Technologies for Information Analysis. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-07952-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-662-07952-2_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-07378-6
Online ISBN: 978-3-662-07952-2
eBook Packages: Springer Book Archive