A Parallel Implementation of the K Nearest Neighbours Classifier in Three Levels: Threads, MPI Processes and the Grid

  • G. Aparício
  • I. Blanquer
  • V. Hernández
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4395)


The work described in this paper tackles the problem of data mining and classification of large amounts of data using the K nearest neighbours classifier (KNN) [1]. The large computing demand of this process is solved with a parallel computing implementation specially designed to work in Grid environments of multiprocessor computer farms. The different parallel computing approaches (intra-node, inter-node and inter-organisations) are not sufficient by themselves to face the computing demand of such a big problem. Instead of using parallel techniques separately, we propose to combine the three of them considering the parallelism grain of the different parts of the problem. The main purpose is to complete a 1 month-CPU job in a few hours. The technologies that are being used are the EGEE Grid Computing Infrastructure running the Large Hadron Collider Computing Grid (LCG 2.6) middleware [3], MPI [4] [5] and POSIX [6] threads. Finally, we compare the results obtained with the most popular and used tools to understand the importance of this strategy.

Topics: Grid, Parallel Computing, Threads and Data Mining.


Parallel Implementation Command Line Interface POSIX Thread Workload Management System Batch Queue 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Cover, T.M., Hart, P.E.: Nearest neighbour pattern recognition. IEEE Trans. on Information Theory 13(1), 2127 (1967)CrossRefGoogle Scholar
  2. 2.
    Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputer Applications 15(3) (2001), http://www.globus.org/research/papers/anatomy.pdf
  3. 3.
    LCG: World Wide Web Computing Grid. Distributed Production Environment of Physics Data Processing. http://lcg.web.cern.ch/LCG
  4. 4.
    Message Passing Interface Forum: MPI: A message-passing interface standard (2003), http://www.mpi-forum.org/
  5. 5.
    Gropp, W., et al.: MPI: The Complete Reference. MIT Press, Cambridge (1998)Google Scholar
  6. 6.
    Drepper, U., Molnar, I.: The Native POSIX Thread Library for Linux (2003), http://people.redhat.com/drepper/nptl-design.pdf
  7. 7.
    Frank, E., Hall, M., L.T.: Weka 3: Data Mining Software in Java (2005), http://www.cs.waikato.ac.nz/ml/weka

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • G. Aparício
    • 1
  • I. Blanquer
    • 1
  • V. Hernández
    • 1
  1. 1.Instituto de las Aplicaciones de las Tecnologías de la Información y Comunicaciones Avanzadas - ITACA, Universidad Politécnica de Valencia. Camino de Vera s/n 46022 ValenciaSpain

Personalised recommendations