Distributed Implementation of an Intelligent Data Classifier
Industry, science and business applications need to manipulate a huge amount of data every day. Most of the time these data come from distributed sources and are analyzed trying to discover knowledge and recognize patterns using Data Mining techniques. Data classification is a technique that allows to decide if a set of data belongs to a group of information or not. Data classification requires putting all data together in a big centralized datasets. To congregate and analyze this dataset represents a very expensive task in terms of time, memory and bandwidth consuming. Nowadays, architectures for Distributed Data Mining have been developed trying to reduce computing and storage costs. This paper presents an approach to building a distributed data classifier which takes only metadata from distributed datasets avoiding the total access to the original data. Using only metadata reduces the computing time and bandwidth consumption required to build a data classifier.
KeywordsData Mining Global Classifier
Unable to display preview. Download preview PDF.
- 1.Artificial Intelligence Unit of University of Dortmund, Yale 4.0., http://rapid-i.com/ (last visit January 2009)
- 2.Khoussainov, R., Zuo, X., Kushmerick, N.: Grid-enabled Weka: A Toolkit for Machine Learning on the Grid. ERCIM 59, 47–48 (2004)Google Scholar
- 3.McQueen, J.: Some methods for classification and analysis of multivariations. In: Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)Google Scholar
- 4.Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: YALE: Rapid Prototyping for Complex Data Mining Tasks. In: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2006)Google Scholar
- 5.Peña, J.M., Sánchez, A., Robles, V., Pérez, M.S., Herrero, P.: Adapting the Weka Data Mining Toolkit to a Grid Based Environment. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 492–497. Springer, Heidelberg (2005)Google Scholar
- 6.Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1(1), 81–106 (1986)Google Scholar
- 7.Ross Quinlan, J.: C4.5: programs for machine learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
- 8.Shaikh Ali, A., Rana, O.F., Taylor, I.J.: Web Services Composition for Distributed Data Mining. In: International Conference Workshop on Parallel Processing, pp. 11–18. IEEE, Los Alamitos (2005)Google Scholar
- 9.Statistics Department of the University of Auckland, R Project 2.6.1., http://www.r-project.org/ (last visit November 2008)
- 11.University of Illinois and Data Mining Research Group and DAIS Research Laboratory, IlliMine 1.1.0., http://illimine.cs.uiuc.edu/ (last visit December 2008)
- 12.Williams, G.: Rattle 2.2.74, http://rattle.togaware.com (last visit May 2009)