Advertisement

An Ordered Preprocessing Scheme for Data Mining

  • Laura Cruz R.
  • Joaquín Pérez
  • Vanesa Landero N.
  • Elizabeth S. del Angel
  • Victor M. Álvarez
  • Verónica Peréz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3157)

Abstract

Data preprocessing plays an important role in many processes of data mining. The practice widely adopted in this area is only to use a preprocessing method like discretization. In this paper we propose an ordered scheme to combine various important methods of data preprocessing. The aim is to increase the accuracy of the most used classification algorithms. The experimental results showed that our proposed scheme is better than the classical scheme.

Keywords

Data Mining Feature Selection Preprocessing Method Feature Subset Selection Proteomic Pattern 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Farhangfar, A.: Experimental analysis of methods for imputation of missing values in databases (2004)Google Scholar
  2. 2.
    Chawla, N.V.: C4.5 and imbalanced data sets: Investigating the effect of sampling method, probabilistic estimate, and decision tree structure. Canada (2002)Google Scholar
  3. 3.
    Raman, B.: Enhancing inductive learning with feature selection andexample selection (2003)Google Scholar
  4. 4.
    Yang, Y., Webb, G.I.: Discretization For Naive-Bayes Learning: Managing Discretization Bias And Variance. Australia (2002)Google Scholar
  5. 5.
    Liu, H., Li, J., Wong, L.: A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns. Singapore (2002)Google Scholar
  6. 6.
    Yang, Y., Webb, G.I.: A Comparative Study of Discretization Methods for Naive-Bayes Classifiers (2002)Google Scholar
  7. 7.
    Kerdprasop, N., Kerdprasop, K., Saiveaw, Y., Pumrungreong, P.: A comparative study of techniques to handle missing values in the classification task of data mining. Thailand (2003)Google Scholar
  8. 8.
    Witten, I.H., Frank, E.: Data Mining, Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publisher, San Francisco (2000)Google Scholar
  9. 9.
    Gunnalan, R., Menzies, T., Appukutty, K., Srinivasan, A.: Feature Subset Selection with TAR2less, USA (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Laura Cruz R.
    • 1
  • Joaquín Pérez
    • 1
  • Vanesa Landero N.
    • 1
  • Elizabeth S. del Angel
    • 1
  • Victor M. Álvarez
    • 1
  • Verónica Peréz
    • 1
  1. 1.Instituto Tecnológico de Ciudad MaderoMéxico

Personalised recommendations