Synonyms
Data mining pipeline; Data mining process; KDD process
Definition
The KDD pipeline describes the complete process of knowledge discovery in databases (KDD), i.e. the process of deriving useful, valid and non-trivial patterns from a large amount of data. The pipeline consists of five consecutive steps:
Selection
The selection step identifies the goal of the current application and selects a data set that is likely to contain relevant patterns.
Preprocessing
The preprocessing step increases the quality of the data set by supplementing missing attributes, removing duplicate instances and resolving data inconsistencies.
Transformation
The transformation step deletes correlated and irrelevant attributes and derives new more meaningful attributes from the current data description.
Data Mining
This step selects a data mining algorithm with respect to the goal which was identified in the selection step and derives patterns or learns functions that are valid for the current data set.
Ev...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Brachman R, Anand T. The process of knowledge discovery in databases: a human centered approach. In: Proceedings of the 10th National Conference on Artificial Intelligence; 1996. p. 37–8.
Fayyad U, Piatetsky-Shapiro G, Smyth P. From data mining to knowledge discovery in databases. In: Proceedings of the 10th National Conference on Artificial Intelligence; 1996. p. 1–30.
Fayyad U, Piatetsky-Shapiro G, Smyth P. Knowledge discovery and data mining: towards a unifying framework. In: Proceedings of the 2nd Internatinal Conference on Knowledge Discovery and Data Mining; 1996. p. 82–8.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Kriegel, HP., Schubert, M. (2018). KDD Pipeline. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_1134
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1134
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering