Abstract
The problem addressed in this paper concerns data reduction through instance selection. The paper proposes an approach based on instance selection from clusters. The process of selection and learning is executed by a team of agents. The approach aims at obtaining a compact representation of the dataset, where the upper bound on the size of data is determined by the user. The basic assumption is that the instance selection is carried out after the training data have been grouped into clusters. The cluster initialization and integration strategies are proposed and experimentally evaluated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Asuncion, A., Newman, D.J.: UCI Machine Learning Repository, School of Information and Computer Science. University of California, Irvine (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Barbucha, D., Czarnowski, I., Jędrzejowicz, P., Ratajczak-Ropel, E., Wierzbowska, I.: e-JABAT - An Implementation of the Web-Based A-Team. In: Nguyen, N.T., Jain, I.C. (eds.) Intelligent Agents in the Evolution of Web and Applications, Studies in Computational Intelligence, vol. 167, pp. 57–86. Springer, Heidelberg (2009)
Bellifemine, F., Caire, G., Poggi, A., Rimassa, G.: JADE. A white paper, Exp. 3(3), 6–20 (2003)
Czarnowski, I., Jędrzejowicz, P.: An Approach to Instance Reduction in Supervised Learning. In: Coenen, F., Preece, A., Macintosh, A. (eds.) Research and Development in Intelligent Systems XX, pp. 267–282. Springer, London (2004)
Czarnowski, I.: Cluster-based instance selection for machine classification. Knowledge and Information Systems (to appear, 2010)
Datasets used for classification: comparison of results. In. directory of data sets, http://www.is.umk.pl/projects/datasets.html (accessed 1 September 2009)
Hamo, Y., Markovitch, S.: The COMPSET Algorithm for Subset Selection. In: Proceedings of The Nineteenth International Joint Conference for Artificial Intelligence, Edinburgh, Scotland, pp. 728–733 (2005)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, pp. 520–528. Springer, New York (2009)
Klusch, M., Lodi, S., Moro, G.-L.: Agent-Based Distributed Data Mining: The KDEC Scheme. In: Klusch, M., et al. (eds.) Intelligent Information Agents. LNCS (LNAI), vol. 2586, pp. 104–122. Springer, Heidelberg (2003)
Krishnaswamy, S., Zaslavsky, A., Loke, S.W.: Techniques for Estimating the Computation and Communication Costs of Distributed Data Mining. In: Sloot, P.M.A., et al. (eds.) ICCS-ComputSci 2002. LNCS, vol. 2329, pp. 603–612. Springer, Heidelberg (2002)
Liu, H., Lu, H., Yao, J.: Identifying Relevant Databases for Multidatabase Mining. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 210–221 (1998)
Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann Publishers, SanMateo (1993)
Rozsypal, A., Kubat, M.: Selecting Representative Examples and Attributes by a Genetic Algorithm. Intelligent Data Analysis 7(4), 291–304 (2003)
Silva, J., Giannella, C., Bhargava, R., Kargupta, H., Klusch, M.: Distributed Data Mining and Agents. Engineering Applications of Artificial Intelligence Journal 18, 791–807 (2005)
Skalak, D.B.: Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithm. In: Proceedings of the International Conference on Machine Learning, pp. 293–301 (1994)
Stolfo, S., Prodromidis, A.L., Tselepis, S., Lee, W., Fan, D.W.: JAM: Java Agents for Meta-learning over Distributed Databases. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, Newport Beach, CA, pp. 74–81. AAAI Press, Menlo Park (1997)
Talukdar, S., Baerentzen, L., Gove, A., de Souza P.: Asynchronous Teams: Co-operation Schemes for Autonomous, Computer-Based Agents, Technical Report EDRC 18-59-96, Carnegie Mellon University, Pittsburgh (1996)
Tsoumakas, G., Angelis, L., Vlahavas, I.: Clustering Classifiers for Knowledge Discovery from Physical Distributed Databased. Data and Knowledge Enginering 49(3) (2004)
Vucetic, S., Obradovic, Z.: Performance Controlled Data Reduction for Knowledge Discovery in Distributed Databases. In: Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 29–39 (2000)
Wilson, D.R., Martinez, T.R.: Reduction Techniques for Instance-based Learning Algorithm. Machine Learning 33(3), 257–286 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Czarnowski, I., Jędrzejowicz, P. (2010). Cluster Integration for the Cluster-Based Instance Selection. In: Pan, JS., Chen, SM., Nguyen, N.T. (eds) Computational Collective Intelligence. Technologies and Applications. ICCCI 2010. Lecture Notes in Computer Science(), vol 6421. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16693-8_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-16693-8_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16692-1
Online ISBN: 978-3-642-16693-8
eBook Packages: Computer ScienceComputer Science (R0)