Effect of Data Distribution in Parallel Mining of Associations

Cheung, David W.; Xiao, Yongqiao

doi:10.1007/0-306-47011-X_4

David W. Cheung³ &
Yongqiao Xiao³

304 Accesses

Abstract

Association rule mining is an important new problem in data mining. It has crucial applications in decision support and marketing strategy. We proposed an efficient parallel algorithm for mining association rules on a distributed share-nothing parallel system. Its efficiency is attributed to the incorporation of two powerful candidate set pruning techniques. The two techniques, distributed and global prunings, are sensitive to two data distribution characteristics: data skewness and workload balance. The prunings are very effective when both the skewness and balance are high. We have implemented FPM on an IBM SP2 parallel system. The performance studies show that FPM outperforms CD consistently, which is a parallel version of the representative Apriori algorithm (Agrawal and Srikant, 1994). Also, the results have validated our observation on the effectiveness of the two pruning techniques with respect to the data distribution characteristics. Furthermore, it shows that FPM has nice scalability and parallelism, which can be tuned for different business applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining associationrules between sets of items in large databases. Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data. pp. 207–216.
Google Scholar
Agrawal, R. and Shafer, J.C. 1996. Parallel mining of association rules: Design, implementation and experience. Special Issue in Data Mining, IEEE Trans. on Knowledge and Data Engineering, IEEE Computer Society, 8(6):962–969.
Google Scholar
Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. Proc. 1994 Int. Conf. Very Large Data Bases. Santiago. Chile, pp. 487–499.
Google Scholar
Brin, S., Motwani, R., Ullman, J., and Tsur, S. 1997. Dynamic itemset counting and implication rules for market basket data. Proc. of 1997 ACM-SIGMOD Int. Conf. On Management of Data. Tucson, Arizona, pp. 255–264.
Google Scholar
Cheung, D.W., Han, J., Ng, V.T., Fu, A.W., and Fu. Y. 1996. A fast distributed algorithm for mining association rules. Proc. of 4th Int. Conf. on Parallel and Distributed Information Systems. Miami Beach, FL, pp. 31–43.
Google Scholar
Cheung, D.W., Han, J., Ng, V.T., and Wong, C.Y. 1996. Maintenance of discovered association rules in large databases: An incremental updating technique. Proc. 1996 IEEE Int. Conf. on Data Engineering. New Orleans, Louisiana.
Google Scholar
Cover T.M. and Thomas, T.A. 1991. Elements of Information Theory. John Wiley & Sons.
Google Scholar
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. 1995. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press.
Google Scholar
Han J. and Fu, Y. 1995. Discovery of multiple-level association rules from large databases. Proc. 1995 Int. Conf. Very Large Data Bases. Zurich, Switzerland, pp. 420–431.
Google Scholar
Han, E., Karypis G., and Kumar, V. 1997. Scalable parallel data mining for association rules. Proc. of 1997 ACM-SIGMOD Int. Conf. On Management of Data.
Google Scholar
Int’l Business Machines. 1995. Scalable POWERparallel Systems, GA23-2475-02 edition.
Google Scholar
MacQueen, J.B. 1967. Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, pp. 281–297.
Google Scholar
Message Passing Interface Forum. 1994. MPI: A Message-Passing Interface Standard.
Google Scholar
Ng, R., Lakshmanan, L., Han J., and Pang, A. 1998. Exploratory mining and pruning optimizations ofconstrainted association rules. Proc. 1998 ACM-SIGMOD Int. Conf. Management ofData. Seattle, WH.
Google Scholar
Park, J.S., Chen, M.S., and Yu, P.S. 1995a. An effective hash-based algorithm formining association rules. Proc. 1995 ACM-SIGMOD Int. Conf. Management of Data. SanJose, CA, pp. 175–186.
Google Scholar
Park, J.S., Chen, M.S., and Yu, P.S. 1995b. Efficient parallel data mining for association rules. Proc. 1995 Int. Conf. on Information and Knowledge Management. Baltimore, MD.
Google Scholar
Savasere, A., Omiecinski, E., and Navathe, S. 1995. An efficient algorithm for mining association rules in large databases. Proc. 1995 Int. Conf. Very Large Data Bases. Zurich, Switzerland, pp. 432–444.
Google Scholar
Shintani, T. and Kitsuregawa, M. 1996. Hash based parallel algorithms for mining association rules. Proc. of 4th Int. Conf. on Parallel and Distributed Information Systems.
Google Scholar
Silberschatz, A., Stonebraker, M., and Ullman, J. 1995. Database research: achievements and opportunities into the 21st century. Report of an NSF Workshop on the Future of Database Systems Research.
Google Scholar
Srikant R. and Agrawal, R. 1995. Mining generalized association rules. Proc. 1995 Int. Conf. Very Large Data Bases. Zurich, Switzerland, pp. 407–419.
Google Scholar
Srikant R. and Agrawal, R. 1996a. Mining sequential patterns: Generalizations and performance improvements. Proc. of the 5th Int. Conf. on Extending Database Technology. Avignon, France.
Google Scholar
Srikant R. and Agrawal, R. 1996b. Mining quantitative association rules in large relational tables. Proc. 1996 ACM-SIGMOD Int. Conf. on Management of Data. Montreal, Canada.
Google Scholar
Zaki, M.J., Ogihara, M., Parthasarathy, S., and Li, W. 1996. Parallel data mining for association rules on shared-memory multi-processors. Supercomputing’96, Pittsburg, PA, Nov. 17–22.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, The Universiry of Hong Kong, Hong Kong
David W. Cheung & Yongqiao Xiao

Authors

David W. Cheung
View author publications
You can also search for this author in PubMed Google Scholar
Yongqiao Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Imperial College, UK
Yike Guo
University of Illinois at Chicago, USA
Robert Grossman

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cheung, D.W., Xiao, Y. (1999). Effect of Data Distribution in Parallel Mining of Associations. In: Guo, Y., Grossman, R. (eds) High Performance Data Mining. Springer, Boston, MA. https://doi.org/10.1007/0-306-47011-X_4

Download citation

DOI: https://doi.org/10.1007/0-306-47011-X_4
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-7923-7745-0
Online ISBN: 978-0-306-47011-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics