Skip to main content
Log in

Data classification through an evolutionary approach based on multiple criteria

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Real-world problems usually present a huge volume of imprecise data. These types of problems may challenge case-based reasoning systems because the knowledge extracted from data is used to identify analogies and solve new problems. Many authors have focused on organizing case memory in patterns to minimize the computational burden and deal with uncertainty. The organization is usually determined by a single criterion, but in some problems, a single criterion can be insufficient to find accurate clusters. This work describes an approach to organize the case memory in patterns based on multiple criteria. This new approach uses the searching capabilities of multiobjective evolutionary algorithms to build a Pareto set of solutions, where each one is a possible organization based on the relevance of objectives. The system shows promising capabilities when it is compared with a successful system based on self-organizing maps. Due to the data set geometry influences, the clustering building process results are analyzed taking into account it. For this reason, some complexity measures are used to categorize data sets according to their topology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aamodt A, Plaza E (1994) Case-based reasoning: foundations issues, methodological variations, and system approaches. AI Commun 7: 39–59

    Google Scholar 

  2. Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, School of Information and Computer Sciences, Irvine

  3. Basu M, Ho TK (2006) Data complexity in pattern recognition, advanced information and knowledge processing. Springer, New York

    Book  Google Scholar 

  4. Bernadó-Mansilla E, Llorà X, Garrell JM (2002) XCS and GALE: a comparative study of two learning classifier systems on data mining. In: Advances in learning classifier systems, vol 2321 of LNAI, pp 115–132. Springer

  5. Bichindaritz I (2006) Memory organization as the missing link between case-based reasoning and information retrieval in biomedicine. Comput Intell 22(3-4): 148–160

    Article  MathSciNet  Google Scholar 

  6. Brown M (1994) A memory model for case retrieval by activation passing. Ph.D. thesis, University of Manchester

  7. Cantu-Paz E (2000) Efficient and accurate parallel genetic algorithms. Kluwer, Norwell

    MATH  Google Scholar 

  8. Chang P, Lai C (2005) A hybrid system combining self-organizing maps with case-based reasoning in wholesaler’s new-release book forecasting. Expert Syst Appl 29(1): 183–192

    Article  MathSciNet  Google Scholar 

  9. Coello CAC (1999) A comprehensive survey of evolutionary-based multiobjective optimization techniques. Knowl Inf Syst 1: 269–308

    Google Scholar 

  10. Corchado E, Corchado JM, Aiken J (2004) Ibr retrieval method based on topology preserving mappings. J Exp Theoret Artif Intell 16(3): 145–160

    Article  MATH  Google Scholar 

  11. Corne DW, Jerram NR, Knowles JD, Oates MJ (2001) PESA-II: region-based selection in evolutionary multiobjective optimization. In: Proceedings of the genetic and evolutionary computation conference, pp 283–290. Morgan Kaufmann Publishers

  12. Corral G, Garcia-Piquer A, Orriols-Puig A, Fornells A, Golobardes E (2011) Analysis of vulnerability assessment results based on CAOS. Appl Soft Comput J 11: 4321–4331

    Article  Google Scholar 

  13. Czarnowski I (2011) Cluster-based instance selection for machine classification. Knowl Inf Syst 1–21. doi:10.1007/s10115-010-0375-z

  14. Davies D, Bouldin D (1979) A cluster separation measure. IEEE Trans Pattern Anal Mach Intell 1(4): 224–227

    Article  Google Scholar 

  15. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7: 1–30

    MathSciNet  MATH  Google Scholar 

  16. Duda RO, Hart PE, Stork DG (2000) Pattern classification. Wiley, New York

    Google Scholar 

  17. Dunn JC (1974) Well separated clusters and optimal fuzzy partitions. J Cybern 4: 95–104

    Article  MathSciNet  Google Scholar 

  18. Fonseca CM., Fleming PJ (1995) An overview of evolutionary algorithms in multiobjective optimization. Evolut Comput 3: 1–16

    Article  Google Scholar 

  19. Fornells A, Golobardes E, Martorell JM, Garrell JM (2008) Patterns out of cases using kohonen maps in breast cancer diagnosis. Int J Neural Syst 18: 33–43

    Article  Google Scholar 

  20. Fornells A, Golobardes E, Martorell JM, Garrell JM, Bernadó E, Macià N (2007) Measuring the applicability of self-organizing maps in a case-based reasoning system. In: Proceedings of 3rd Iberian conference on pattern recognition and image analysis, vol 4478 of LNCS, pp 532–539. Springer

  21. Fornells A, Golobardes E, Martorell JM, Garrell JM, Bernadó E, Macià N (2007) A methodology for analyzing the case retrieval from a clustered case memory. In: Proceedings of 7th international conference on case-based reasoning, vol 4626 of LNAI, pp 122–136. Springer (best paper nomination)

  22. Freitas AA (2002) Data mining and knowledge discovery with evolutionary algorithms. Springer, Secaucus

  23. Friedman M (1940) A comparison of alternative tests of significance for the problem of m rankings. Ann Math Stat 11: 86–92

    Article  Google Scholar 

  24. Gan G, Chaoqun M, Wu J (2000) Data clustering theory, algorithms, and applications. ASA-SIAM, Philadelphia

    Google Scholar 

  25. Handl J, Knowles J (2002) An evolutionary approach to multiobjective clustering. IEEE Trans Evolut Comput 1(1): 56–76

    Google Scholar 

  26. Herrera F, Carmona C, González P, del Jesus M (2010) An overview on subgroup discovery: foundations and applications. Knowl Inf Syst 1–31. doi:10.1007/s10115-010-0356-2

  27. Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3): 289–300

    Article  Google Scholar 

  28. Holland JH (1975) Adaptation in natural and artificial systems. The University of Michigan Press, Ann Arbor

    Google Scholar 

  29. Iredi S, Merkle D, Middendorf M (2000) Bi-criterion optimization with multi colony ant algorithms. In: Proceedings of the first international conference on evolutionary multi-criterion optimization (EMO 2001), no. 1993 in LNCS, pp 359–372. Springer

  30. Kohonen T (2000) Self-organizing maps, 3rd edn. Springer, Berlin

    Google Scholar 

  31. Law MH, Topchy AP, Jain AK (2004) Multiobjective data clustering. In: IEEE computer society conference on computer vision and pattern recognition, vol 2, pp 424–430

  32. Lenz M, Burkhard HD, Brückner S (1996) Applying case retrieval nets to diagnostic tasks in technical domains. In: Proceedings of the third European workshop on advances in case-based reasoning, pp 219–233. Springer

  33. Malek M, Amy B (2007) A pre-processing model for integrating cbr and prototype-based neural networks. In: Sun R, Alexandre F (eds) Connectionism-symbolic Integration. Erlbaum, Hillsdale

    Google Scholar 

  34. Nemenyi PB (1963) Distribution-free multiple comparisons. Ph.D. thesis, Princeton University, New Jersey, USA

  35. Nicholson R, Bridge D, Wilson N (2006) Decision diagrams: fast and flexible support for case retrieval and recommendation. In: Proceedings of the 8th European conference on case-based reasoning, vol 4106 of LNAI, pp 136–150. Springer

  36. Park Y, Song M (1998) A genetic algorithm for clustering problems. In: Proceedings of the 3rd annual conference on genetic programming, pp 568–575. Morgan Kaufmann

  37. Plaza E, Arcos J-L (1990) A reflective architecture for integrated memory-based learning and reasoning. In: Richter MM, Wess S, Althoff KD, Maurer F (eds) Proceedings first European workshop on case-based reasoning, vol 2, pp 329–334

  38. Porter B (1986) PROTOS: an experiment in knowledge acquisition for heuristic classification tasks. In: Proceedings of first international meeting on advances in learning, Les Arcs, France, pp 159–174

  39. Rissland EL, Skalak DB, Friedman M (1993) Case retrieval through multiple indexing and heuristic search. In: Proceedings of the international joint conferences on artificial intelligence, pp 902–908

  40. Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Google Scholar 

  41. Saha S, Bandyopadhyay S (2010) A new multiobjective clustering technique based on the concepts of stability and symmetry. Knowl Inf Syst 23: 1–27

    Article  Google Scholar 

  42. Schaaf JW (1995) Fish and Sink—an anytime-algorithm to retrieve adequate cases. In: Proceedings of the first international conference on case-based reasoning research and development, vol 1010, pp 538–547. Springer

  43. Strehl A, Ghosh J (2002) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3: 583–617

    MathSciNet  Google Scholar 

  44. Strobbe M, Van Laere O, Dhoedt B, De Turck F, Demeester P (2011) Hybrid reasoning technique for improving context-aware applications. Knowl Inf Syst. doi:10.1007/s10115-011-0411-7

  45. Vernet D, Golobardes E (2003) An unsupervised learning approach for case-based classifier systems. Expert Update Special Group Artif Intell 6(2): 37–42

    Google Scholar 

  46. Wess S, Althoff KD, Derwand G (1994) Using k-d trees to improve the retrieval step in case-based reasoning. In: Selected papers from the first European workshop on topics in case-based reasoning, vol 837, pp 167–181. Springer

  47. Wittten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques with Java implementations, 3rd edn. Morgan Kaufmann Publishers, Burlington

    Google Scholar 

  48. Yang Q, Wu J (2001) Enhancing the effectiveness of interactive case-based reasoning with clustering and decision forests. Appl Intell 14(1): 49–64

    Article  MATH  Google Scholar 

  49. Zenko B, Dzeroski S, Struyf J (2005) Learning predictive clustering rules. In: Knowledge discovery in inductive databases, vol 3933 of lecture notes in computer science, pp 234–250. Springer

  50. Zitzler E, Deb K, Thiele L (2000) Comparison of multiobjective evolutionary algorithms: empirical results. Evolut Comput 8(2): 173–195

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Garcia-Piquer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Garcia-Piquer, A., Fornells, A., Orriols-Puig, A. et al. Data classification through an evolutionary approach based on multiple criteria. Knowl Inf Syst 33, 35–56 (2012). https://doi.org/10.1007/s10115-011-0462-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-011-0462-9

Keywords

Navigation