Advertisement

Knowledge and Information Systems

, Volume 29, Issue 1, pp 1–24 | Cite as

Merging local patterns using an evolutionary approach

  • María C. GayaEmail author
  • J. Ignacio Giráldez
Regular Paper

Abstract

This paper describes a Decentralized Agent-based model for Theory Synthesis (DATS) implemented by MASETS, a Multi-Agent System for Evolutionary Theory Synthesis. The main contributions are the following: first, a method for the synthesis of a global theory from distributed local theories. Second, a conflict resolution mechanism, based on genetic algorithms, that deals with collision/contradictions in the knowledge discovered by different agents at their corresponding locations. Third, a system-level classification procedure that improves the results obtained from both: the monolithic classifier and the best local classifier. And fourth, a method for mining very large datasets that allows for divide-and-conquer mining followed by merging of discoveries. The model is validated with an experimental application run on 15 datasets. Results show that the global theory outperforms all the local theories, and the monolithic theory (obtained from mining the concatenation of all the available distributed data), in a statistically significant way.

Keywords

Multi-database mining Genetic algorithms Distributed data mining Multi-agent systems 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    ECML, PKDD (2006) ECML PKDD 2006 discovery challenge. Downloaded in 2010, from http://www.ecmlpkdd2006.org/challenge.html
  2. 2.
    Asuncion A, Newman D (2007) UCI machine learning repository. Downloaded in 2008, from http://archive.ics.uci.edu/ml/
  3. 3.
    Barandela R, Valdovinos RM, Sánchez JS (2003) New applications of ensembles of classifiers. Pattern Anal Appl 6(3): 245–256MathSciNetCrossRefGoogle Scholar
  4. 4.
    Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. J Mach Learn 36(1–2): 105–139CrossRefGoogle Scholar
  5. 5.
    Bellifemine F, Caire G, Greenwood D (2007) Developing multi-agent systems with JADE. Wiley, NewYorkCrossRefGoogle Scholar
  6. 6.
    Breiman L (1996) Bagging predictors. Mach Learn 24(2): 123–140MathSciNetzbMATHGoogle Scholar
  7. 7.
    Breiman L (2001) Random forest. Mach Learn 45(1): 5–32zbMATHCrossRefGoogle Scholar
  8. 8.
    Dietterich TG (2000) Ensemble methods in machine learning. Multiple Classifier Syst 1857: 1–15CrossRefGoogle Scholar
  9. 9.
    Dzeroski S, Zenko B (2004) Is combining classifiers with stacking better than selecting the best one?. Mach Learn 54(3): 255–273zbMATHCrossRefGoogle Scholar
  10. 10.
    Freitas A (2007) A review of evolutionary algorithms for data mining. In: Maimon O, Rokach L (eds) Soft computing for knowledge discovery and data mining. Springer, Berlin, pp 61–93Google Scholar
  11. 11.
    Freund Y (1998) An introduction to boosting based classification. AT&T conference on quantitative analysisGoogle Scholar
  12. 12.
    Gama J, Brazdil P (2000) Cascade generalization. Mach Learn 41(3): 315–343zbMATHCrossRefGoogle Scholar
  13. 13.
    García-Pedrajas N, García-Osorio C, García-Osorio C (2007) Nonlinear boosting projections for ensemble construction. J Mach Learn Res 8: 1–33MathSciNetGoogle Scholar
  14. 14.
    Giráldez JI (1999) Modelo de toma de decisiones y aprendizaje en sistemas multi-agente. Tesis para el grado de doctor en Informática, Universidad Politécnica de MadridGoogle Scholar
  15. 15.
    Grossman RL, Turinsky AL (2000) A framework for finding distributed data mining strategies that are intermediate between centralized strategies and in-place strategies. KDD workshop on distributed data mining, knowledge and information systemsGoogle Scholar
  16. 16.
    Gaya MC, Giráldez JI (2008) Experiments in multi agent learning. In: Hybrid artificial intelligence systems, Third International Workshop, HAIS 2008. Springer, Burgos Spain, pp 78–85Google Scholar
  17. 17.
    Gaya MC, Giráldez JI (2008) Techniques for distributed theory synthesis in multiagent systems. In: International symposium on distributed computing and artificial intelligence, DCAI 2008 advances in soft computing. Springer, Salamanca Spain, pp 395–402Google Scholar
  18. 18.
    Gaya MC, Giráldez JI, Cortizo JC (2007) Uso de algoritmos evolutivos para la fusión de teorías en minería de datos distribuida. CAEPIA’07, SalamancaGoogle Scholar
  19. 19.
    Guo H, Viktor HL (2008) Multirelational classification: a multiple view approach. Knowledge Inf Syst 17(3): 287–312CrossRefGoogle Scholar
  20. 20.
    Hernández J, Ramírez MJ, Ferri C (2004) Introducción a la Minería de Datos”. Pearson Prentice-Hall, MadridGoogle Scholar
  21. 21.
    Ho TK (1995) Random decision forests. In: 3rd International conference on document analysis and recognition. Montreal, Canada, pp 278–282Google Scholar
  22. 22.
    Ho T (1998) Nearest neighbors in random subspaces. Lecture notes in computer science: advances in pattern recognition, pp 640–648Google Scholar
  23. 23.
    Hongjun HL, Liu H, Lu H, Yao J (2001) Towards multidatabase mining: identifying relevant databases. IEEE Trans Knowl Data Eng 13(4): 541–553CrossRefGoogle Scholar
  24. 24.
    Kim Y, Street WN, Menczer F (2006) Optimal ensemble construction via meta-evolutionary ensembles. Expert Syst Appl 30(4): 705–714CrossRefGoogle Scholar
  25. 25.
    Ko AH-R, Sabourin R, Souz AD (2006) Evolving ensemble of classifiers in random subspace. In: Annual conference on genetic and evolutionary computation GECCO ‘06Google Scholar
  26. 26.
    Koza J, Keane M, Streeter M, Mydlowec W, Yu J, Lanza G (1992) Genetic programming IV: routine human-competitive machine intelligence. Springer, BerlinGoogle Scholar
  27. 27.
    Kuncheva LI (2001) Combining classifiers: soft computing solutions. In: Pal SK, Pal A (eds) Pattern recognition: from classical to modern approaches. World Scientific Publishing Co, Singapore, pp 427–452CrossRefGoogle Scholar
  28. 28.
    Kuncheva LI, Whitaker CJ (2003) Measures of diversity in classifier ensembles. Mach Learn 51(2): 181–207zbMATHCrossRefGoogle Scholar
  29. 29.
    Langdom W, Buxtom B (2001) Genetic programing for combining classifiers. Genetic and evolutionary computation GECCO. Morgan Kaufmann, UK, pp 66–73Google Scholar
  30. 30.
    Luo H, Fan J, Lin X, Zhou A, Bertino E (2009) A distributed approach to enabling privacy-preserving model-based classifier training. Knowledge Inf Syst 20(2): 157–185CrossRefGoogle Scholar
  31. 31.
    Merz CJ (1999) Using correspondence analysis to combine classifiers. Mach Learn 36(1–2): 33–58CrossRefGoogle Scholar
  32. 32.
    Mitchel TM (1997) Machine learning. McGraw-Hill, NYGoogle Scholar
  33. 33.
    Quinlan R (1994) C4.5: Programs for machine learning. Mach Learn 16(3): 235–240Google Scholar
  34. 34.
    Ramkumar T, Srinivasan R (2008) Modified algorithms for synthesizing high-frequency rules from different data sources. Knowledge Inf Syst 17(3): 313–334CrossRefGoogle Scholar
  35. 35.
    Schapire RE (2002) Advances in boosting. In: Eighteenth conference on uncertainty in artificial intelligenceGoogle Scholar
  36. 36.
    Stolfo S, Prodromidis A, Tselepis S, Lee W, Fan DW, Chan PK (1997) JAM: Java agents for meta-learning over distributed databases. In: Third international conference in knowledge discovery and data mining. Newport Beach, California, pp 74–81Google Scholar
  37. 37.
    Ting KM, Witten IH (1999) Issues in stacked generalization. J Artif Intell Res (JAIR) 10: 271–289zbMATHGoogle Scholar
  38. 38.
    Todorovski L, Dzeroski S (2003) Combining classifiers with meta decision trees. Mach Learn 50(3): 223–249zbMATHCrossRefGoogle Scholar
  39. 39.
    Webb GI (2000) MultiBoosting: a technique for combining boosting and wagging. Mach Learn 40(2): 159–196CrossRefGoogle Scholar
  40. 40.
    Witten I, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann, Los AltoszbMATHGoogle Scholar
  41. 41.
    Wolpert DH (1992) Stacked generalization. Neural Netw 5: 241–259CrossRefGoogle Scholar
  42. 42.
    Wu X, Zhang S (2003) Synthesizing high-frequency rules from different data sources. IEEE Trans Knowledge Data Eng 15(2): 353–367CrossRefGoogle Scholar
  43. 43.
    Zhang S, Zaki MJ (2006) Mining multiple data sources: local pattern analysis. Data Mining Knowledge Discovery 12(2–3): 121–125MathSciNetCrossRefGoogle Scholar
  44. 44.
    Zhang S, Wu X, Zhang C (2003) Multi-database mining. IEEE Comput Intell Bull 2: 5–13Google Scholar

Copyright information

© Springer-Verlag London Limited 2010

Authors and Affiliations

  1. 1.Department of Computer Systems and AutomationUniversidad Europea de MadridMadridSpain

Personalised recommendations