Skip to main content

A Distributed Framework for Parallel Data Mining Using HPJava

Abstract

Java has become a language of choice for applications executing in heterogeneous environments utilising distributed objects and multithreading. To handle large data sets, scalable and efficient implementations of data mining approaches are required, generally employing computationally intensive algorithms. Conventional Java implementations do not directly provide support for the data structures often encountered in such algorithms, and they also lack repeatability in numerical precision across platforms. This paper describes a distributed framework employing task and data parallelism, and implemented in high performance Java (HPJava). Issues of interest for data mining algorithms are identified, and possible solutions discussed for overcoming limitations in the Java Virtual Machine. The framework supports parallelism across workstation clusters, using the message-passing interface as middleware, and can support different analysis algorithms, wrapped as Java objects, and linked to various databases using the Java database connectivity interface. Guidelines are provided for implementing parallel and distributed data mining on large data sets, and a proof-of-concept data mining application is analysed using a neural network.

This is a preview of subscription content, access via your institution.

References

  1. Bradley P, Fayyad U, and Mangasarian O: 'Data Mining: Overview and Optimization Opportunities', in INFORMS: Journal of Computing (1998).

  2. Gropp W, Lusk E and Skjellum A: 'Using MPI', MIT Press (1994).

  3. Carpenter B, Fox G, Leskiw D, Li X and Wen Y: 'Language Bindings for a Data-Parallel Runtime', NPAC — Syracuse University, Syracuse, New York (1997).

    Google Scholar 

  4. Albrecht J and Lehner W: 'On-Line Analytical Processing in Distributed Data Warehouses', in IDEAS Proceedings, IEEE Computer Society Press (July 1998).

  5. Quinlan R: 'C4.5: Programs for Machine Learning', Morgan Kaufmann (1997).

  6. Craven M and Shavlik J: 'Using Neural Networks for Data Mining', Future Generation Computer Systems (1997).

  7. Tabachnick B and Fidell L: 'Using Multivariate Statistics', Addison Wesley (1996).

  8. Goldberg D E: 'Genetic Algorithms in Search, Optimization and Machine Learning', Addison Wesley (1989).

  9. Agrawal R, Arning A, Bollinger T, Mehta M, Shafer J and Srikant R: 'The Quest Data Mining System', in Proc of the 2nd Int Conf on Knowledge Discovery in Databases and Data Mining, Portland, Oregon (1996).

  10. Shafer J, Agrawal R and Mehta M: 'SPRINT: A Scalable Parallel Classifier for Data Mining', in 22nd VLDB Proceedings, Bombay, India (1996).

  11. Pendse N: 'OLAP Omnipresent', BYTE (February 1998).

  12. Torrent Systems and IBM Corporation: 'White paper: Achieving Scalable Performance for Large SAS Applications', (1997).

  13. Flohr U: 'OLAP by Web', BYTE (September 1997).

  14. Sun Microsystems: 'Swing and Java Foundation Classes', (1998) — http://java.sun.com/products/jfc/tsc/swingdoc-static/intro.html

  15. Das R, Uysal M, Salz J and Hwang Y-S: 'Communication optimizations for irregular scientific computations on distributed memory architectures', Journal of Parallel and Distributed Computing, 22,No 3, pp 462-479 (1994).

    Google Scholar 

  16. Choi J, Dongarra J J, Ostrouchov S, Petitet A, Walker D W and Whaley R C: 'The design and implementation of the scaLAPACK LU, QR, and Cholesky factorization routines', Scientific Programming, 5, pp 173-184 (1996).

    Google Scholar 

  17. Parallel Compiler Runtime Consortium: 'DARPA project', (1998).

  18. Nordstrom T and Svensson B: 'Using and Designing Massively Parallel Computers for Artificial Neural Networks', Journal of Parallel and Distributed Computing, 14, pp 260-285 (1992).

    Google Scholar 

  19. Lange D B and Oshima M: 'Programming and Deploying Java Mobile Agents with Aglets', Addison-Wesley (1998).

Download references

Authors

About this article

Cite this article

Rana, O., Fisk, D. A Distributed Framework for Parallel Data Mining Using HPJava. BT Technology Journal 17, 146–154 (1999). https://doi.org/10.1023/A:1009696924527

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1009696924527

Keywords

  • Data Mining
  • Virtual Machine
  • Mining Algorithm
  • Data Mining Algorithm
  • Java Virtual Machine