Data Preprocessing and Data Mining as Generalization

  • Anita Wasilewska
  • Ernestina Menasalvas
Part of the Studies in Computational Intelligence book series (SCI, volume 118)

Summary

We present here an abstract model in which data preprocessing and data mining proper stages of the Data Mining process are are described as two different types of generalization. In the model the data mining and data preprocessing algorithms are defined as certain generalization operators. We use our framework to show that only three Data Mining operators: classification, clustering, and association operator are needed to express all Data Mining algorithms for classification, clustering, and association, respectively. We also are able to show formally that the generalization that occurs in the preprocessing stage is different from the generalization inherent to the data mining proper stage.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    M. Hadjimichael, A. Wasilewska. A Hierarchical Model for Information Generalization. Proceedings of the Fourth Joint Conference on Information Sciences, Rough Sets, Data Mining and Granual Computing (RSDMGrC’98), NC, USA, vol. II, pp. 306–309Google Scholar
  2. 2.
    J. Han, M. Kamber. Data Mining: Concepts and Techniques. Morgan Kauffman, Los Altos, CA, 2000Google Scholar
  3. 3.
    M. Inuiguchi, T. Tanino. Classification Versus Approximation Oriented Generalization of Rough Sets. Bulletin of International Rough Set Society, 7:1/2, 2003Google Scholar
  4. 4.
    J. Komorowski. Modelling Biological Phenomena with Rough Sets. Proceedings of Third International Conference RSCTC’02, Malvern, PA, October 2002, p. 13. Springer Lecture Notes in Artificial IntelligenceGoogle Scholar
  5. 5.
    T.Y. Lin. Database Mining on Derived Attributes. Proceedings of Third International Conference RSCTC’02, Malvern, PA, October 2002, pp. 14–32. Springer Lecture Notes in Artificial IntelligenceGoogle Scholar
  6. 6.
    J.F. Martinez, E. Menasalvas, A. Wasilewska, C. Fernández, M. Hadjimichael. Extension of Relational Management System with Data Mining Capabilities. Proceedings of Third International Conference RSCTC’02, Malvern, PA, October 2002, pp. 421–428. Springer Lecture Notes in Artificial IntelligenceGoogle Scholar
  7. 7.
    E. Menasalvas, A. Wasilewska, C. Fernández. The Lattice Structure of the KDD Process: Mathematical Expression of the Model and its Operators. International Journal of Information Systems and Fundamenta Informaticae, 48–62, special issues, 2001Google Scholar
  8. 8.
    E. Menasalvas, A. Wasilewska, C. Fernández, J.F. Martinez. Data Mining – A Semantical Model. Proceedings of 2002 World Congress on Computational Intelligence, Honolulu, Hawai, May 11–17, 2002, pp. 435–441Google Scholar
  9. 9.
    Z. Pawlak, Information Systems – Theoretical Foundations. Information Systems, 6:205–218, 1981MATHCrossRefGoogle Scholar
  10. 10.
    Z. Pawlak, Rough Sets – Theoretical Aspects Reasoning About Data. Kluwer, Dordecht, 1991MATHGoogle Scholar
  11. 11.
    A. Skowron, Data Filtration: A Rough Set Approach. Proceedings de Rough Sets, Fuzzy Sets and Knowledge Discovery. 1993, pp. 108–118Google Scholar
  12. 12.
    A. Wasilewska, E.M. Ruiz, M.C. Fernández-Baizan. Modelization of Rough Set Functions in the KDD Frame. First International Conference on Rough Sets and Current Trends in Computing (RSCTC’98), Warsaw, Poland, June 22–26 1998Google Scholar
  13. 13.
    A. Wasilewska, E. Menasalvas. Data Preprocessing and Data Mining as Generalization Process. Proceedings of ICDM’04, the Fourth IEEE International Conference on Data Mining, Brighton, UK, November 1–4, 2004, pp. 25–29Google Scholar
  14. 14.
    A. Wasilewska, E. Menasalvas. Data Mining Operators. Proceedings of ICDM’04, the Fourth IEEE International Conference on Data Mining, Brighton, UK, November 1–4, 2004, pp. 43–52Google Scholar
  15. 15.
    A. Wasilewska, E. Menasalvas, C. Scharff. Uniform Model for Data Mining. Proceedings of FDM05 (Foundations of Data Mining), in ICDM2005, Fifth IEEE International Conference on Data Mining, Austin, Texas, November 27–29, 2005, pp. 19–27Google Scholar
  16. 16.
    A. Wasilewska, E.M. Ruiz. Data Mining as Generalization: A Formal Model. Foundation and Advances in Data Mining, T.Y. Lin, W. Chu, editors. Springer Lecture Notes in Artificial Intelligence, 2005Google Scholar
  17. 17.
    W. Ziarko, X. Fei. VPRSM Approach to WEB Searching. Proceedings of Third International RSCTC’02 Conference, Malvern, PA, October 2002, pp. 514–522. Springer Lecture Notes in Artificial IntelligenceGoogle Scholar
  18. 18.
    W. Ziarko. Variable Precision Rough Set Model. Journal of Computer and Systen Sciences, 46(1):39–59, 1993MATHCrossRefMathSciNetGoogle Scholar
  19. 19.
    J.T. Yao, Y.Y. Yao. Induction of Classification Rules by Granular Computing. Proceedings of Third International RSCTC’02 Conference, Malvern, PA, October 2002, pp. 331–338. Springer Lecture Notes in Artificial IntelligenceGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Anita Wasilewska
    • 1
  • Ernestina Menasalvas
    • 2
  1. 1.Department of Computer ScienceState University of New YorkStony BrookUSA
  2. 2.Departamento de Lenguajes y Sistemas Informaticos Facultad de InformaticaU.P.MMadridSpain

Personalised recommendations