Distributed Implementation of an Intelligent Data Classifier

  • Victor J. Sosa-Sosa
  • Ivan Lopez-Arevalo
  • Omar Jasso-Luna
  • Hector Fraire-Huacuja
Part of the Studies in Computational Intelligence book series (SCI, volume 312)


Industry, science and business applications need to manipulate a huge amount of data every day. Most of the time these data come from distributed sources and are analyzed trying to discover knowledge and recognize patterns using Data Mining techniques. Data classification is a technique that allows to decide if a set of data belongs to a group of information or not. Data classification requires putting all data together in a big centralized datasets. To congregate and analyze this dataset represents a very expensive task in terms of time, memory and bandwidth consuming. Nowadays, architectures for Distributed Data Mining have been developed trying to reduce computing and storage costs. This paper presents an approach to building a distributed data classifier which takes only metadata from distributed datasets avoiding the total access to the original data. Using only metadata reduces the computing time and bandwidth consumption required to build a data classifier.


Data Mining Global Classifier 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Artificial Intelligence Unit of University of Dortmund, Yale 4.0., (last visit January 2009)
  2. 2.
    Khoussainov, R., Zuo, X., Kushmerick, N.: Grid-enabled Weka: A Toolkit for Machine Learning on the Grid. ERCIM 59, 47–48 (2004)Google Scholar
  3. 3.
    McQueen, J.: Some methods for classification and analysis of multivariations. In: Proc. 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)Google Scholar
  4. 4.
    Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: YALE: Rapid Prototyping for Complex Data Mining Tasks. In: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2006)Google Scholar
  5. 5.
    Peña, J.M., Sánchez, A., Robles, V., Pérez, M.S., Herrero, P.: Adapting the Weka Data Mining Toolkit to a Grid Based Environment. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 492–497. Springer, Heidelberg (2005)Google Scholar
  6. 6.
    Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1(1), 81–106 (1986)Google Scholar
  7. 7.
    Ross Quinlan, J.: C4.5: programs for machine learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  8. 8.
    Shaikh Ali, A., Rana, O.F., Taylor, I.J.: Web Services Composition for Distributed Data Mining. In: International Conference Workshop on Parallel Processing, pp. 11–18. IEEE, Los Alamitos (2005)Google Scholar
  9. 9.
    Statistics Department of the University of Auckland, R Project 2.6.1., (last visit November 2008)
  10. 10.
    Talia, D., Trunfio, P., Verta, O.: Weka4WS: A WSRF-Enabled Weka Toolkit for Distributed Data Mining on Grids. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 309–320. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  11. 11.
    University of Illinois and Data Mining Research Group and DAIS Research Laboratory, IlliMine 1.1.0., (last visit December 2008)
  12. 12.
    Williams, G.: Rattle 2.2.74, (last visit May 2009)
  13. 13.
    Witten, H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann Publishers, San Francisco (2005)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Victor J. Sosa-Sosa
    • 1
  • Ivan Lopez-Arevalo
    • 1
  • Omar Jasso-Luna
    • 1
  • Hector Fraire-Huacuja
    • 2
  1. 1.Ciudad VictoriaCentro de Investigación y de Estudios Avanzados del IPN (CINVESTAV)TamaulipasMéxico
  2. 2.Instituto Tecnológico de Ciudad MaderoTamaulipasMéxico

Personalised recommendations