Data Preprocessing and Data Mining as Generalization
 Anita Wasilewska,
 Ernestina Menasalvas
 … show all 2 hide
Summary
We present here an abstract model in which data preprocessing and data mining proper stages of the Data Mining process are are described as two different types of generalization. In the model the data mining and data preprocessing algorithms are defined as certain generalization operators. We use our framework to show that only three Data Mining operators: classification, clustering, and association operator are needed to express all Data Mining algorithms for classification, clustering, and association, respectively. We also are able to show formally that the generalization that occurs in the preprocessing stage is different from the generalization inherent to the data mining proper stage.
 M. Hadjimichael, A. Wasilewska. A Hierarchical Model for Information Generalization. Proceedings of the Fourth Joint Conference on Information Sciences, Rough Sets, Data Mining and Granual Computing (RSDMGrC’98), NC, USA, vol. II, pp. 306–309
 J. Han, M. Kamber. Data Mining: Concepts and Techniques. Morgan Kauffman, Los Altos, CA, 2000
 M. Inuiguchi, T. Tanino. Classification Versus Approximation Oriented Generalization of Rough Sets. Bulletin of International Rough Set Society, 7:1/2, 2003
 J. Komorowski. Modelling Biological Phenomena with Rough Sets. Proceedings of Third International Conference RSCTC’02, Malvern, PA, October 2002, p. 13. Springer Lecture Notes in Artificial Intelligence
 T.Y. Lin. Database Mining on Derived Attributes. Proceedings of Third International Conference RSCTC’02, Malvern, PA, October 2002, pp. 14–32. Springer Lecture Notes in Artificial Intelligence
 J.F. Martinez, E. Menasalvas, A. Wasilewska, C. Fernández, M. Hadjimichael. Extension of Relational Management System with Data Mining Capabilities. Proceedings of Third International Conference RSCTC’02, Malvern, PA, October 2002, pp. 421–428. Springer Lecture Notes in Artificial Intelligence
 E. Menasalvas, A. Wasilewska, C. Fernández. The Lattice Structure of the KDD Process: Mathematical Expression of the Model and its Operators. International Journal of Information Systems and Fundamenta Informaticae, 48–62, special issues, 2001
 E. Menasalvas, A. Wasilewska, C. Fernández, J.F. Martinez. Data Mining – A Semantical Model. Proceedings of 2002 World Congress on Computational Intelligence, Honolulu, Hawai, May 11–17, 2002, pp. 435–441
 Z. Pawlak, Information Systems – Theoretical Foundations. Information Systems, 6:205–218, 1981 CrossRef
 Z. Pawlak, Rough Sets – Theoretical Aspects Reasoning About Data. Kluwer, Dordecht, 1991
 A. Skowron, Data Filtration: A Rough Set Approach. Proceedings de Rough Sets, Fuzzy Sets and Knowledge Discovery. 1993, pp. 108–118
 A. Wasilewska, E.M. Ruiz, M.C. FernándezBaizan. Modelization of Rough Set Functions in the KDD Frame. First International Conference on Rough Sets and Current Trends in Computing (RSCTC’98), Warsaw, Poland, June 22–26 1998
 A. Wasilewska, E. Menasalvas. Data Preprocessing and Data Mining as Generalization Process. Proceedings of ICDM’04, the Fourth IEEE International Conference on Data Mining, Brighton, UK, November 1–4, 2004, pp. 25–29
 A. Wasilewska, E. Menasalvas. Data Mining Operators. Proceedings of ICDM’04, the Fourth IEEE International Conference on Data Mining, Brighton, UK, November 1–4, 2004, pp. 43–52
 A. Wasilewska, E. Menasalvas, C. Scharff. Uniform Model for Data Mining. Proceedings of FDM05 (Foundations of Data Mining), in ICDM2005, Fifth IEEE International Conference on Data Mining, Austin, Texas, November 27–29, 2005, pp. 19–27
 A. Wasilewska, E.M. Ruiz. Data Mining as Generalization: A Formal Model. Foundation and Advances in Data Mining, T.Y. Lin, W. Chu, editors. Springer Lecture Notes in Artificial Intelligence, 2005
 W. Ziarko, X. Fei. VPRSM Approach to WEB Searching. Proceedings of Third International RSCTC’02 Conference, Malvern, PA, October 2002, pp. 514–522. Springer Lecture Notes in Artificial Intelligence
 W. Ziarko. Variable Precision Rough Set Model. Journal of Computer and Systen Sciences, 46(1):39–59, 1993 CrossRef
 J.T. Yao, Y.Y. Yao. Induction of Classification Rules by Granular Computing. Proceedings of Third International RSCTC’02 Conference, Malvern, PA, October 2002, pp. 331–338. Springer Lecture Notes in Artificial Intelligence
 Title
 Data Preprocessing and Data Mining as Generalization
 Book Title
 Data Mining: Foundations and Practice
 Pages
 pp 469484
 Copyright
 2008
 DOI
 10.1007/9783540784883_27
 Print ISBN
 9783540784876
 Online ISBN
 9783540784883
 Series Title
 Studies in Computational Intelligence
 Series Volume
 118
 Series ISSN
 1860949X
 Publisher
 Springer Berlin Heidelberg
 Copyright Holder
 Springer Berlin Heidelberg
 Additional Links
 Topics
 Industry Sectors
 eBook Packages
 Editors

 Dr. Tsau Young Lin ^{(2)}
 Dr. Ying Xie ^{(3)}
 Dr. Anita Wasilewska ^{(4)}
 Dr. ChurnJung Liau ^{(5)}
 Editor Affiliations

 2. Department of Computer Science, San Jose State University
 3. Department of Computer Science and Information Systems, Kennesaw State University
 4. Department of Computer Science, The University at Stony Brook
 5. Institute of Information Science, Academia Sinica
 Authors

 Anita Wasilewska ^{(6)}
 Ernestina Menasalvas ^{(7)}
 Author Affiliations

 6. Department of Computer Science, State University of New York, Stony Brook, NY, USA
 7. Departamento de Lenguajes y Sistemas Informaticos Facultad de Informatica, U.P.M, Madrid, Spain
Continue reading...
To view the rest of this content please follow the download PDF link above.