Compiler and Runtime Support for Shared Memory Parallelization of Data Mining Algorithms

Li, Xiaogang; Jin, Ruoming; Agrawal, Gagan

doi:10.1007/11596110_18

Compiler and Runtime Support for Shared Memory Parallelization of Data Mining Algorithms

Xiaogang Li⁶,
Ruoming Jin⁶ &
Gagan Agrawal⁶

Conference paper

551 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 2481))

Abstract

Data mining techniques focus on finding novel and useful patterns or models from large datasets. Because of the volume of the data to be analyzed, the amount of computation involved, and the need for rapid or even interactive analysis, data mining applications require the use of parallel machines. We have been developing compiler and runtime support for developing scalable implementations of data mining algorithms. Our work encompasses shared memory parallelization, distributed memory parallelization, and optimizations for processing disk-resident datasets.

In this paper, we focus on compiler and runtime support for shared memory parallelization of data mining algorithms. We have developed a set of parallelization techniques that apply across algorithms for a variety of mining tasks. We describe the interface of the middleware where these techniques are implemented. Then, we present compiler techniques for translating data parallel code to the middleware specification. Finally, we present a brief evaluation of our compiler using apriori association mining and k-means clustering.

This work was supported by NSF grant ACR-9982087, NSF CAREER award ACR-9733520, and NSF grant ACR-0130437.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Shafer, J.: Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering 8(6), 962–969 (1996)
Article Google Scholar
Blume, W., Doallo, R., Eigenman, R., Grout, J., Hoelflinger, J., Lawrence, T., Lee, J., Padua, D., Paek, Y., Pottenger, B., Rauchwerger, L., Tu, P.: Parallel programming with Polaris. IEEE Computer 29(12), 78–82 (1996)
Google Scholar
Gutierrez, E., Plata, O., Zapata, E.L.: A compiler method for the parallel execution of irregular reductions in scalable shared memory multiprocessors. In: ICS 2000, pp. 78–87. ACM Press, New York (2000)
Chapter Google Scholar
Hall, M., Amarsinghe, S., Murphy, B., Liao, S., Lam, M.: Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer (12) (December 1996)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2000)
Google Scholar
High Performance Fortran Forum. Hpf language specification, version 2.0 (January 1997), Available from http://www.crpc.rice.edu/HPFF/versions/hpf2/files/hpf-v20.ps.gz
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
MATH Google Scholar
Jin, R., Agrawal, G.: A middleware for developing parallel data mining implementations. In: Proceedings of the first SIAM conference on Data Mining (April 2001)
Google Scholar
Jin, R., Agrawal, G.: Shared Memory Parallelization of Data Mining Algorithms: Techniques, Programming Interface, and Performance. In: Proceedings of the second SIAM conference on Data Mining (April 2002)
Google Scholar
Lin, Y., Padua, D.: On the automatic parallelization of sparse and irregular Fortran programs. In: O’Hallaron, D.R. (ed.) LCR 1998. LNCS, vol. 1511, pp. 41–56. Springer, Heidelberg (1998)
Chapter Google Scholar
Lu, H., Cox, A.L., Dwarkadas, S., Rajamony, R., Zwaenepoel, W.: Compiler and software distributed shared memory support for irregular applications. In: Proceedings of the Sixth ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPOPP), pp. 48–56. ACM Press, New York (1997), ACM SIGPLAN Notices 32(7)
Chapter Google Scholar
Murthy, S.K.: Automatic construction of decision trees from data: A multi-disciplinary survey. Data Mining and Knowledge Discovery 2(4), 345–389 (1998)
Article Google Scholar
Parthasarathy, S., Zaki, M., Li, W.: Memory placement techniques for parallel association mining. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (KDD) (August 1998)
Google Scholar
Parthasarathy, S., Zaki, M., Ogihara, M., Li, W.: Parallel data mining for association rules on shared-memory systems. In: Knowledge and Information Systems (2000) (to appear)
Google Scholar
Rinard, M.C., Diniz, P.C.: Eliminating Synchronization Bottlenecks in Object- Oriented Programs Using Adaptive Replication. In: Proceedings of International Conference on Supercomputing (ICS). ACM Press, New York (1999)
Google Scholar
Saltz, J.H., Mirchandaney, R., Crowley, K.: Run-time parallelization and scheduling of loops. IEEE Transactions on Computers 40(5), 603–612 (1991)
Article Google Scholar
Zaki, M.J., Ho, C.-T., Agrawal, R.: Parallel classification for data mining on sharedmemory multiprocessors. In: IEEE International Conference on Data Engineering, May 1999, pp. 198–205 (1999)
Google Scholar
Zaki, M.J., Ogihara, M., Parthasarathy, S., Li, W.: Parallel data mining for association rules on shared memory multiprocessors. In: Proceedings of Supercomputing 1996 (November 1996)
Google Scholar
Zaki, M.J.: Parallel and distributed association mining: A survey. IEEE Concurrency 7(4), 14–25 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Sciences, Ohio State University, Columbus, OH, 43210, USA
Xiaogang Li, Ruoming Jin & Gagan Agrawal

Authors

Xiaogang Li
View author publications
You can also search for this author in PubMed Google Scholar
Ruoming Jin
View author publications
You can also search for this author in PubMed Google Scholar
Gagan Agrawal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Deptartment of Computer Science, University of Maryland, 4135 A.V. Williams Bldg., College Park, 20742, MD, USA
Bill Pugh
Dept. of Computer Science, Univ. of Maryland at College Park,
Chau-Wen Tseng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Jin, R., Agrawal, G. (2005). Compiler and Runtime Support for Shared Memory Parallelization of Data Mining Algorithms. In: Pugh, B., Tseng, CW. (eds) Languages and Compilers for Parallel Computing. LCPC 2002. Lecture Notes in Computer Science, vol 2481. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11596110_18

Download citation

DOI: https://doi.org/10.1007/11596110_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30781-5
Online ISBN: 978-3-540-31612-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics