Skip to main content

Compiler and Runtime Support for Shared Memory Parallelization of Data Mining Algorithms

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 2481))

Abstract

Data mining techniques focus on finding novel and useful patterns or models from large datasets. Because of the volume of the data to be analyzed, the amount of computation involved, and the need for rapid or even interactive analysis, data mining applications require the use of parallel machines. We have been developing compiler and runtime support for developing scalable implementations of data mining algorithms. Our work encompasses shared memory parallelization, distributed memory parallelization, and optimizations for processing disk-resident datasets.

In this paper, we focus on compiler and runtime support for shared memory parallelization of data mining algorithms. We have developed a set of parallelization techniques that apply across algorithms for a variety of mining tasks. We describe the interface of the middleware where these techniques are implemented. Then, we present compiler techniques for translating data parallel code to the middleware specification. Finally, we present a brief evaluation of our compiler using apriori association mining and k-means clustering.

This work was supported by NSF grant ACR-9982087, NSF CAREER award ACR-9733520, and NSF grant ACR-0130437.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Shafer, J.: Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering 8(6), 962–969 (1996)

    Article  Google Scholar 

  2. Blume, W., Doallo, R., Eigenman, R., Grout, J., Hoelflinger, J., Lawrence, T., Lee, J., Padua, D., Paek, Y., Pottenger, B., Rauchwerger, L., Tu, P.: Parallel programming with Polaris. IEEE Computer 29(12), 78–82 (1996)

    Google Scholar 

  3. Gutierrez, E., Plata, O., Zapata, E.L.: A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors. In: ICS 2000, pp. 78–87. ACM Press, New York (2000)

    Chapter  Google Scholar 

  4. Hall, M., Amarsinghe, S., Murphy, B., Liao, S., Lam, M.: Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer (12) (December 1996)

    Google Scholar 

  5. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2000)

    Google Scholar 

  6. High Performance Fortran Forum. Hpf language specification, version 2.0 (January 1997), Available from http://www.crpc.rice.edu/HPFF/versions/hpf2/files/hpf-v20.ps.gz

  7. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  8. Jin, R., Agrawal, G.: A middleware for developing parallel data mining implementations. In: Proceedings of the first SIAM conference on Data Mining (April 2001)

    Google Scholar 

  9. Jin, R., Agrawal, G.: Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance. In: Proceedings of the second SIAM conference on Data Mining (April 2002)

    Google Scholar 

  10. Lin, Y., Padua, D.: On the automatic parallelization of sparse and irregular Fortran programs. In: O’Hallaron, D.R. (ed.) LCR 1998. LNCS, vol. 1511, pp. 41–56. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  11. Lu, H., Cox, A.L., Dwarkadas, S., Rajamony, R., Zwaenepoel, W.: Compiler and software distributed shared memory support for irregular applications. In: Proceedings of the Sixth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), pp. 48–56. ACM Press, New York (1997), ACM SIGPLAN Notices 32(7)

    Chapter  Google Scholar 

  12. Murthy, S.K.: Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery 2(4), 345–389 (1998)

    Article  Google Scholar 

  13. Parthasarathy, S., Zaki, M., Li, W.: Memory placement techniques for parallel association mining. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD) (August 1998)

    Google Scholar 

  14. Parthasarathy, S., Zaki, M., Ogihara, M., Li, W.: Parallel data mining for association rules on shared-memory systems. In: Knowledge and Information Systems (2000) (to appear)

    Google Scholar 

  15. Rinard, M.C., Diniz, P.C.: Eliminating Synchronization Bottlenecks in Object- Oriented Programs Using Adaptive Replication. In: Proceedings of International Conference on Supercomputing (ICS). ACM Press, New York (1999)

    Google Scholar 

  16. Saltz, J.H., Mirchandaney, R., Crowley, K.: Run-time parallelization and scheduling of loops. IEEE Transactions on Computers 40(5), 603–612 (1991)

    Article  Google Scholar 

  17. Zaki, M.J., Ho, C.-T., Agrawal, R.: Parallel classification for data mining on sharedmemory multiprocessors. In: IEEE International Conference on Data Engineering, May 1999, pp. 198–205 (1999)

    Google Scholar 

  18. Zaki, M.J., Ogihara, M., Parthasarathy, S., Li, W.: Parallel data mining for association rules on shared memory multiprocessors. In: Proceedings of Supercomputing 1996 (November 1996)

    Google Scholar 

  19. Zaki, M.J.: Parallel and distributed association mining: A survey. IEEE Concurrency 7(4), 14–25 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, X., Jin, R., Agrawal, G. (2005). Compiler and Runtime Support for Shared Memory Parallelization of Data Mining Algorithms. In: Pugh, B., Tseng, CW. (eds) Languages and Compilers for Parallel Computing. LCPC 2002. Lecture Notes in Computer Science, vol 2481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11596110_18

Download citation

  • DOI: https://doi.org/10.1007/11596110_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-30781-5

  • Online ISBN: 978-3-540-31612-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics