Skip to main content
Log in

An Approach to Data Reduction and Integrated Machine Classification

  • Published:
New Generation Computing Aims and scope Submit manuscript

Abstract

The goal of the paper is to propose a novel approach to integrated machine classification and to investigate the effect of integration of the data reduction with data mining stage. The integration of both important steps of knowledge discovery in databases is recognized as a vital step towards improving effectiveness of the data mining effort. After having the introduced data reduction and integration schemes a solution to the integrated classification problem is proposed. The proposed algorithm allows for integrating data reduction through simultaneous instance and feature selection, with learning process using population-based and A-Team techniques. To validate the proposed approach and to investigate the effect of data reduction combined with different integration schemes, the computation experiment has been carried out. Experiment based on several benchmark datasets has shown that integrated data reduction and classifier learning outperform traditional approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aha, D.W., Kibler, D., Albert, M.K., “Instance-based learning algorithms,” Machine Learning, 6, pp. 37-66, 1991.

    Google Scholar 

  2. Aksela, M., Adaptive Combinations of Classifiers with Application to On-line Handwritten Character Recognition, Ph.D. Thesis, Department of Computer Science and Engineering, Helsinki University of Technology, Helsinki, 2007.

  3. Asuncion, A., Newman, D.J., “UCI Machine Learning Repository,” Irvine, CA: University of California, School of Information and Computer Science, http://www.ics.uci.edu/~mlearn/MLRepository.html, 2007.

  4. Barbucha, D., Czarnowski, I., J drzejowicz, P., Ratajczak-Ropel, E. and Wierzbowska, I., “JADE-Based A-Team as a Tool for Implementing Population- Based Algorithms,” in Proc. of the Sixth International Conference on Intelligent Systems Design and Applications (ISDA'06) (Yuehui Chen, Ajith Abraham eds.) IEEE Computer Society, 3, pp. 144-149, 2006.

  5. Bhanu, B., Peng, J., “Adaptive Integration Image Segmentation and Object Recognition,” IEEE Trans. on Systems, Man and Cybernetics, 30, 4, pp. 427- 441, 2000.

    Article  Google Scholar 

  6. Bull, L., “Learning Classifier Systems: A Brief Introduction,” in Applications of Learning Classifier Systems, Studies in Fuzziness and Soft Computing (Larry Bull ed.), Springer, 2004.

  7. Cano, J. R., Herrera, F., Lozano, M., “On the Combination of Evolutionary Algorithms and Stratified Strategies for Training Set Selection in Data Mining,” in Pattern Recognition Letters, Elsevier, 2004.

  8. Chang, C.-L., “Finding Prototypes for Nearest Neighbor Classifier,” IEEE Transactions on Computers, 23, 11, pp. 1179-1184, 1974.

    Article  MATH  Google Scholar 

  9. Czarnowski, I., J drzejowicz, P., “An Approach to Instance Reduction in Supervised Learning,” in Research and Development in Intelligent Systems XX (Coenen F., Preece A. and Macintosh A. eds.), Springer, London, pp. 267-282, 2004.

  10. Dash, M., Liu, H., “Feature Selection for Classification,” Intelligence Data Analysis, 1, 3, pp. 131-156, 1997.

  11. Duch, W., “Results - Comparison of Classification,” Nicolaus Copernicus University, http://www.is.umk.pl/projects/datasets.html, 2002.

  12. Frawley, W. J., Piatetsky-Shapiro, G. and Matheus, C., “Knowledge Discovery in Databases - An Overview,” in Knowledge Discovery in Databases (Piatetsky- Shapiro G., Matheus C. eds.) AAAI/MIT Press, 1991.

  13. Glover, F., “Tabu search. Part I and II,” ORSA Journal of Computing, 1, 3, Summer 1990 and 2, 1, Winter 1990.

  14. Han, J., Kamber, M., Data Mining. Concepts and Techniques, Academic Press, San Diego, 2001.

  15. Hart, P. E., “The Condensed Nearest Neighbor Rule,” IEEE Transactions on Information Theory, 14, pp. 515-516, 1968.

    Article  Google Scholar 

  16. Ishibuchi, H., Nakashima, T. and Nii, M., “Learning of Neural Networks with GA-based Instance Selection,” in Proc. of the IFSA World Congress and 20th NAFIPS International Conference, 4, pp. 2102-2107, 2001.

  17. Holland, J. H., “Adaptation,” in Progress in Theoretical Biology (Rosen & Snell eds.), 4, Plenum, 1976.

  18. Jȩdrzejowicz, P., “Social Learning Algorithm as a Tool for Solving Some Difficult Scheduling Problems,” Foundation of Computing and Decision Sciences, 24, pp. 51-66, 1999.

    Google Scholar 

  19. Kohavi, R., John, G. H., “Wrappers for Feature Subset Selection,” Artificial Intelligence , 97, 1-2, pp. 273-324, 1997.

    Article  MATH  Google Scholar 

  20. Lazarevic, A., Obradovic, Z., “Data Reduction Using Multiple Models Integration,” in Proc. of the 5th European Conference Principles and Practice of Knowledge Discovery and Databases , LNCS 2168, Springer-Verlag London, pp. 301-313, 2001.

  21. Liu, H., Lu, H., Yao, J., “Identifying relevant databases for multidatabase mining,” in Proc. of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 210-221, 1998.

  22. Meiri, R., Zahavi, J., “Using Simulated Annealing to Optimize the Feature Selection Problem in Marketing Applications,” European Journal of Operational Research, 17, 3, pp. 842-858, 2006.

  23. Michalewicz, Z., Genetic Algorithms + Data Structures = Evolution Programs, Springer, Berlin, 1996.

  24. Morgan, J., Daugherty, R., Hilchie, A., Carey B., “Sample size and modeling accuracy of decision tree based data mining tools,” Academy of Information and Management Science Journal, 6, 2, pp. 71-99, 2003.

    Google Scholar 

  25. Quinlan, J. R., C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, San Mateo, 1993.

  26. Raman, B., Ioerger, T. R., “Enhancing learning using feature and example selection,” in Journal of Machine Learning Research, 2003.

  27. Ritter, G. L., Woodruff, H. B., Lowry, S. R., Isenhour, T. L., “An Algorithm for a Selective Nearest Decision Rule,” IEEE Trans. on Information Theory, 21, pp. 665-669, 1975.

    Article  MATH  Google Scholar 

  28. Rozsypal, A., Kubat, M., "Selecting Representative Examples and Attributes by a Genetic Algorithm," Intelligent Data Analysis, 7, 4, pp. 291-304, 2003.

    MATH  Google Scholar 

  29. Sahel, Z., Bouchachia, A., Gabrys, B., Rogers P., “Adaptive Mechanisms for Classification Problems with Drifting Data.” in KES 2007 (B. Apolloni et al. eds.), LNAI 4693, Springer-Verlag, pp. 419-426, 2007.

  30. Skalak, D. B., “Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithm,” in Proc. of the International Conference on Machine Learning, pp. 293-301, 1994.

  31. Talukdar, S., Baerentzen, L., Gove, A. and De Souza, P., “Asynchronous Teams: Cooperation Schemes for Autonomous. Computer-Based Agents,” Technical Report EDRC 18-59-96, Carnegie Mellon University, Pittsburgh, 1996.

  32. Tomek, I., “An Experiment with the Edited Nearest-Neighbor Rule,” IEEE Trans. on Systems, Man, and Cybernetics, 6, 6, pp. 448-452, 1976.

    Article  MATH  MathSciNet  Google Scholar 

  33. Wilson, D. R., Martinez, T. R., “An integrated instance-based learning algorithm,” Computational Intelligence, 16, pp. 1-28, 2000.

    Article  MathSciNet  Google Scholar 

  34. Wilson, D. R., Martinez, T. R., “Reduction Techniques for Instance-based Learning Algorithm,” Machine Learning, 33, 3, Kluwer Academic Publishers, Boston, pp. 257-286, 2000.

  35. Witten, I. H., Frank, E., Data Mining: Practical Machine Learning Tools and Techniques with JAVA Implementations, Morgan Kaufmann, San Francisco, 2003.

  36. Wroblewski, J., Adaptacyjne metody klasyfikacji obiektow, Ph.D. Thesis, University of Warsaw, Warsaw (in Polish), 2001.

  37. EL-Manzalawy, Y., Honavar, V., “WLSVM: Integrating LibSVM into Weka Environment,” Software available at http://www.cs.iastate.edu/~yasser/wlsvm, 2005.

  38. Zongker, D., Jain, A., “Algorithm for Feature Selection: An Evaluation,” in Proc. of the International Conference on Pattern Recognition, ICPR ’96, pp. 18-22, 1996.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ireneusz Czarnowski.

About this article

Cite this article

Czarnowski, I., Jȩdrzejowicz, P. An Approach to Data Reduction and Integrated Machine Classification. New Gener. Comput. 28, 21–40 (2010). https://doi.org/10.1007/s00354-008-0073-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00354-008-0073-5

Keywords:

Navigation