Data Reduction for Instance-Based Learning Using Entropy-Based Partitioning

  • Seung-Hyun Son
  • Jae-Yearn Kim
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3982)


Instance-based learning methods such as the nearest neighbor classifier have proven to perform well in pattern classification in several fields. Despite their high classification accuracy, they suffer from a high storage requirement, computational cost, and sensitivity to noise. In this paper, we present a data reduction method for instance-based learning, based on entropy-based partitioning and representative instances. Experimental results show that the new algorithm achieves a high data reduction rate as well as classification accuracy.


Data Reduction Irrelevant Attribute Euclidean Distance Measure Data Reduction Method Representative Instance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data Mining Knowledge Discovery 6, 393–423 (2002)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Cano, J.R., Herrera, F., Lozano, M.: On the combination of evolutionary algorithms and strafitied strategies for training set selection in data mining. Applied Soft Computing (2005) (In Press, Correted Proof)Google Scholar
  3. 3.
    Datta, P., Kibler, D.: Learning prototypical concept description. In: Proceedings of the 12th International Conference on Machine Learning, pp. 158–166 (1995)Google Scholar
  4. 4.
    Datta, P., Kibler, D.: Symbolic nearest mean classifier. In: Proceedings of the 14th National Conference of Artificial Intelligence, pp. 82–87 (1997)Google Scholar
  5. 5.
    Lam, W., Keung, C.L., Ling, C.X.: Learning good prototypes for classification using filtering and abstraction of instances. Pattern Recognition 35, 1491–1506 (2002)MATHCrossRefGoogle Scholar
  6. 6.
    Sanchez, J.S.: High training set size reduction by space partitioning and prototype abstraction. Pattern Recognition 37, 1561–1564 (2004)CrossRefGoogle Scholar
  7. 7.
    Dasarath, B.V.: Nearest Neighbor Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)Google Scholar
  8. 8.
    Wilson, D.R., Martinez, T.R.: Reduction Techniques for instance-based learning algorithms. Mach. Learning. 38, 257–286 (2000)MATHCrossRefGoogle Scholar
  9. 9.
    Cano, J.R., Herrera, F., Lozano, M.: Using evolutionary algorithms as instance selection for data reduction in kdd: an experimental study. IEEE Transactions on Evolutionary Computation 7(6), 561–575 (2003)CrossRefGoogle Scholar
  10. 10.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2001)Google Scholar
  11. 11.
    Merz, C.J., Murphy, P.M.: UCI Repository of Machine Learning Databases, Internet

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Seung-Hyun Son
    • 1
  • Jae-Yearn Kim
    • 1
  1. 1.Department of Industrial EngineeringHanyang UniversitySeoulSouth Korea

Personalised recommendations