Skip to main content

Effect of Data Distribution in Parallel Mining of Associations

  • Chapter
High Performance Data Mining
  • 304 Accesses

Abstract

Association rule mining is an important new problem in data mining. It has crucial applications in decision support and marketing strategy. We proposed an efficient parallel algorithm for mining association rules on a distributed share-nothing parallel system. Its efficiency is attributed to the incorporation of two powerful candidate set pruning techniques. The two techniques, distributed and global prunings, are sensitive to two data distribution characteristics: data skewness and workload balance. The prunings are very effective when both the skewness and balance are high. We have implemented FPM on an IBM SP2 parallel system. The performance studies show that FPM outperforms CD consistently, which is a parallel version of the representative Apriori algorithm (Agrawal and Srikant, 1994). Also, the results have validated our observation on the effectiveness of the two pruning techniques with respect to the data distribution characteristics. Furthermore, it shows that FPM has nice scalability and parallelism, which can be tuned for different business applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining associationrules between sets of items in large databases. Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data. pp. 207–216.

    Google Scholar 

  • Agrawal, R. and Shafer, J.C. 1996. Parallel mining of association rules: Design, implementation and experience. Special Issue in Data Mining, IEEE Trans. on Knowledge and Data Engineering, IEEE Computer Society, 8(6):962–969.

    Google Scholar 

  • Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules. Proc. 1994 Int. Conf. Very Large Data Bases. Santiago. Chile, pp. 487–499.

    Google Scholar 

  • Brin, S., Motwani, R., Ullman, J., and Tsur, S. 1997. Dynamic itemset counting and implication rules for market basket data. Proc. of 1997 ACM-SIGMOD Int. Conf. On Management of Data. Tucson, Arizona, pp. 255–264.

    Google Scholar 

  • Cheung, D.W., Han, J., Ng, V.T., Fu, A.W., and Fu. Y. 1996. A fast distributed algorithm for mining association rules. Proc. of 4th Int. Conf. on Parallel and Distributed Information Systems. Miami Beach, FL, pp. 31–43.

    Google Scholar 

  • Cheung, D.W., Han, J., Ng, V.T., and Wong, C.Y. 1996. Maintenance of discovered association rules in large databases: An incremental updating technique. Proc. 1996 IEEE Int. Conf. on Data Engineering. New Orleans, Louisiana.

    Google Scholar 

  • Cover T.M. and Thomas, T.A. 1991. Elements of Information Theory. John Wiley & Sons.

    Google Scholar 

  • Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R. 1995. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press.

    Google Scholar 

  • Han J. and Fu, Y. 1995. Discovery of multiple-level association rules from large databases. Proc. 1995 Int. Conf. Very Large Data Bases. Zurich, Switzerland, pp. 420–431.

    Google Scholar 

  • Han, E., Karypis G., and Kumar, V. 1997. Scalable parallel data mining for association rules. Proc. of 1997 ACM-SIGMOD Int. Conf. On Management of Data.

    Google Scholar 

  • Int’l Business Machines. 1995. Scalable POWERparallel Systems, GA23-2475-02 edition.

    Google Scholar 

  • MacQueen, J.B. 1967. Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley symposium on mathematical statistics and probability, pp. 281–297.

    Google Scholar 

  • Message Passing Interface Forum. 1994. MPI: A Message-Passing Interface Standard.

    Google Scholar 

  • Ng, R., Lakshmanan, L., Han J., and Pang, A. 1998. Exploratory mining and pruning optimizations ofconstrainted association rules. Proc. 1998 ACM-SIGMOD Int. Conf. Management ofData. Seattle, WH.

    Google Scholar 

  • Park, J.S., Chen, M.S., and Yu, P.S. 1995a. An effective hash-based algorithm formining association rules. Proc. 1995 ACM-SIGMOD Int. Conf. Management of Data. SanJose, CA, pp. 175–186.

    Google Scholar 

  • Park, J.S., Chen, M.S., and Yu, P.S. 1995b. Efficient parallel data mining for association rules. Proc. 1995 Int. Conf. on Information and Knowledge Management. Baltimore, MD.

    Google Scholar 

  • Savasere, A., Omiecinski, E., and Navathe, S. 1995. An efficient algorithm for mining association rules in large databases. Proc. 1995 Int. Conf. Very Large Data Bases. Zurich, Switzerland, pp. 432–444.

    Google Scholar 

  • Shintani, T. and Kitsuregawa, M. 1996. Hash based parallel algorithms for mining association rules. Proc. of 4th Int. Conf. on Parallel and Distributed Information Systems.

    Google Scholar 

  • Silberschatz, A., Stonebraker, M., and Ullman, J. 1995. Database research: achievements and opportunities into the 21st century. Report of an NSF Workshop on the Future of Database Systems Research.

    Google Scholar 

  • Srikant R. and Agrawal, R. 1995. Mining generalized association rules. Proc. 1995 Int. Conf. Very Large Data Bases. Zurich, Switzerland, pp. 407–419.

    Google Scholar 

  • Srikant R. and Agrawal, R. 1996a. Mining sequential patterns: Generalizations and performance improvements. Proc. of the 5th Int. Conf. on Extending Database Technology. Avignon, France.

    Google Scholar 

  • Srikant R. and Agrawal, R. 1996b. Mining quantitative association rules in large relational tables. Proc. 1996 ACM-SIGMOD Int. Conf. on Management of Data. Montreal, Canada.

    Google Scholar 

  • Zaki, M.J., Ogihara, M., Parthasarathy, S., and Li, W. 1996. Parallel data mining for association rules on shared-memory multi-processors. Supercomputing’96, Pittsburg, PA, Nov. 17–22.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Kluwer Academic Publishers

About this chapter

Cite this chapter

Cheung, D.W., Xiao, Y. (1999). Effect of Data Distribution in Parallel Mining of Associations. In: Guo, Y., Grossman, R. (eds) High Performance Data Mining. Springer, Boston, MA. https://doi.org/10.1007/0-306-47011-X_4

Download citation

  • DOI: https://doi.org/10.1007/0-306-47011-X_4

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-7923-7745-0

  • Online ISBN: 978-0-306-47011-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics