Skip to main content

Efficient Parallel Algorithms for Mining Associations

  • Conference paper
  • First Online:
Large-Scale Parallel Data Mining

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1759))

Abstract

The problem of mining hidden associations present in the large amounts of data has seen widespread applications in many practical domains such as customer-oriented planning and marketing, telecommunication network monitoring, and analyzing data from scientific experiments. The combinatorial complexity of the problem and phenomenal growth in the sizes of available datasets motivate the need for efficient and scalable parallel algorithms. The design of such algorithms is challenging. This chapter presents an evolutionary and comparative review of many existing representative serial and parallel algorithms for discovering two kinds of associations. The first part of the chapter is devoted to the non-sequential associations, which utilize the relationships between events that happen together. The second part is devoted to the more general and potentially more useful sequential associations, which utilize the temporal or sequential relationships between events. It is shown that many existing algorithms actually belong to a few categories which are decided by the broader design strategies. Overall the aim of the chapter is to provide a comprehensive account of the challenges and issues involved in effective parallel formulations of algorithms for discovering associations, and how various existing algorithms try to handle them.

This work was supported by NSF grant ACI-9982274, by Army High Performance Computing Research Center cooperative agreement number DAAH04-95-2-0003/contract number DAAH04-95-C-0008, the content of which does not necessarily reflect the position or the policy of the government, and no official endorsement should be inferred. Access to computing facilities was provided by AHPCRC, Minnesota Supercomputer Institute. Related papers are available via WWW at URL: http://www.cs.umn.edu/~kumar.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chen, M., Han, J., Yu, P.: Data mining: An overview from database perspective. IEEE Transactions on Knowledge and Data Eng. 8 (1996) 866–883 83

    Article  Google Scholar 

  2. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. of 1993 ACM-SIGMOD Int. Conf. on Management of Data, Washington, D.C. (1993) 84

    Google Scholar 

  3. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of the 20th VLDB Conference, Santiago, Chile (1994) 487–499 84, 87, 87, 88

    Google Scholar 

  4. Park, J., Chen, M., Yu, P.: An effective hash-based algorithm for mining association rules. In: Proc. of 1995 ACM-SIGMOD Int. Conf. on Management of Data. (1995) 84, 91, 91, 99

    Google Scholar 

  5. Savasere, A., Omiecinski, E., Navathe, S.: An efficient algorithm for mining association rules in large databases. In: Proc. of the 21st VLDB Conference, Zurich, Switzerland (1995) 432–443 84, 85, 87, 91, 92, 98

    Google Scholar 

  6. Mueller, A.: Fast sequential and parallel algorithms for association rule mining: A comparison. Technical Report CS-TR-3515, Dept. of Computing Science, University of Maryland, College Park, MD (1995) 84, 85, 91, 93, 93, 95, 95, 95, 100, 100

    Google Scholar 

  7. Toivonen, H.: Sampling large databases for association rules. In: Proc. of the 22nd VLDB Conference. (1996) 84, 91, 92, 93

    Google Scholar 

  8. Amir, A., Feldman, R., Kashi, R.: A new and versatile method for association generation. In Komorowski, H.J., Zytkow, J.M., eds.: Proceedings of Principles of Data Mining and Knowledge Discovery, First European Symposium (PKDD’97). Lecture Notes in Computer Science. Volume 1263. Springer, Trondheim, Norway (1997) 221–231 84, 91, 95, 96

    Google Scholar 

  9. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proc. of the Third Intl Conference on Knowledge Discovery and Data Mining. (1997) 84, 91, 93, 94, 112

    Google Scholar 

  10. Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proc. of 1997 ACM-SIGMOD Int. Conf. on Management of Data, Tucson, Arizona (1997) 255–264 84, 91, 93, 94

    Google Scholar 

  11. Agarwal, R.C., Aggarwal, C., Prasad, V.V.V.: A tree projection algorithm for generation of frequent item-sets. Journal of Parallel and Distributed Computing (Special Issue on High Performance Data Mining) (2000) 84, 85, 91, 91, 100, 107

    Google Scholar 

  12. Agarwal, R.C., Aggarwal, C., Prasad, V.V.V.: Depth-first generation of large itemsets for association rules. Technical Report RC-21538, IBM Research Division (1999) 84, 91, 91, 93, 95

    Google Scholar 

  13. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. Technical Report CMPT99-12, School of Computing Science, Simon Fraser University (1999) 84, 91, 95, 95

    Google Scholar 

  14. Agrawal, R., Shafer, J.: Parallel mining of association rules: Design, implementation and experience. Technical Report RJ10004, IBM Research Division, Almaden Research Center (1996) 85, 113

    Google Scholar 

  15. Han, E., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. IEEE Transactions on Knowledge and Data Eng. (1999) 85, 90, 98, 103, 103, 106, 110

    Google Scholar 

  16. Park, J., Chen, M., Yu, P.: Efficient parallel data mining for association rules. In: Proceedings of the 4th Intl Conf. on Information and Knowledge Management. (1995) 85, 99

    Google Scholar 

  17. Shintani, T., Kitsuregawa, M.: Hash based parallel algorithms for mining association rules. In: Proc. of the Conference on Parallel and Distributed Information Systems. (1996) 85, 100, 101, 106, 110, 110, 119, 122

    Google Scholar 

  18. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New parallel algorithms for fast discovery of association rules. Data Mining and Knowledge Discovery: An International Journal 1 (1997) 85, 97, 110, 112

    Google Scholar 

  19. Cheung, D., Ng, V., Fu, A., Fu, Y.: Efficient mining of association rules in distributed databases. IEEE Transactions on Knowledge and Data Eng. 8 (1996) 911–922 85, 110, 110

    Article  Google Scholar 

  20. Cheung, D., Han, J., Ng, V.T., nd Y. Fu, A.W.F.: A fast distributed algorithm for mining association rules. In: Proc. of 1996 International Conference on Parallel and Distributed Information Systems (PDIS’96), Miami Beach (1996) 85, 111, 111

    Google Scholar 

  21. Cheung, D., Xiao, Y.: Effect of data skewness in parallel mining of association rules. In: Research and Development in Knowledge Discovery and Data Mining: Second Pacific-Asia Conference (PAKDD’98), Melbourne, Australia (1998) 85, 112

    Google Scholar 

  22. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proc. of the Intl Conference on Data Engineering (ICDE), Taipei, Taiwan (1996) 85, 85, 86, 118

    Google Scholar 

  23. Mannila, H., Toivonen, H., Verkamo, A.I.: Discovering frequent episodes in sequences. In: Proc. of the First Intl Conference on Knowledge Discovery and Data Mining, Montreal, Quebec (1995) 210–215 85, 85, 86

    Google Scholar 

  24. Joshi, M.V., Karypis, G., Kumar, V.: Universal formulation of sequential patterns. Technical Report TR 99-021, Department of Computer Science, University of Minnesota, Minneapolis (1999) 85, 86, 114, 115, 116, 117, 117, 118, 118

    Google Scholar 

  25. Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Proc. of the Fifth Intl Conference on Extending Database Technology, Avignon, France (1996) 86, 117, 117, 118, 119, 122

    Google Scholar 

  26. Bettini, C., Wang, X.S., Jajodia, S.: Testing complex temporal relationships involving multiple granularities and its application to data mining. In: Proc. of ACM PODS’96, Montreal (1996) 68–78 86, 117, 117

    Google Scholar 

  27. Houtsma, M.A.W., Swami, A.N.: Set-oriented mining for association rules in relational databases. In: Proc. of the 11th Intl Conf. on Data Eng., Taipei, Taiwan (1995) 25–33 87

    Google Scholar 

  28. Zaki, M.J.: Parallel and distributed association mining: A survey. IEEE Concurrency (Special Issue on Data Mining) (1999) 93, 97, 97, 112

    Google Scholar 

  29. Sedgewick, R.: Algorithms. Second edn. Addison-Wesley (1988) 96

    Google Scholar 

  30. Agrawal, R., Shafer, J.: Parallel mining of association rules. IEEE Transactions on Knowledge and Data Eng. 8 (1996) 962–969 98, 98, 101, 101, 102

    Article  Google Scholar 

  31. Kumar, V., Grama, A., Gupta, A., Karypis, G.: Introduction to Parallel Computing: Algorithm Design and Analysis. Benjamin Cummings/ Addison Wesley, Redwod City (1994) 98, 101, 103

    MATH  Google Scholar 

  32. Han, E., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. In: Proc. of 1997 ACM-SIGMOD Int. Conf. on Management of Data, Tucson, Arizona (1997) 103, 106, 110, 122

    Google Scholar 

  33. Papadimitriou, C.H., Steiglitz, K.: Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Englewood Cliffs, NJ (1982) 105

    MATH  Google Scholar 

  34. Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Technical Report C-1997-15, Department of Computer Science, University of Helsinki, Finland (1997) 117, 117

    Google Scholar 

  35. Garofalakis, M.N., Rastogi, R., Shim, K.: SPIRIT: Sequential pattern mining with regular expression constraints. In: Proc. of the 25th VLDB Conference, Edinburgh, Scotland (1999) 223–234 117, 117, 118

    Google Scholar 

  36. Zaki, M.J.: Efficient enumeration of frequent sequences. In: Proc. of 7th International Conference on Information and Knowledge Management (CIKM’98), Washington DC (1998) 68–75 118

    Google Scholar 

  37. Joshi, M.V., Karypis, G., Kumar, V.: Parallel algorithms for mining sequential associations: Issues and challenges. Technical Report under preparation, Department of Computer Science, University of Minnesota, Minneapolis (1999) 119, 121, 121, 121, 121, 122

    Google Scholar 

  38. Joshi, M.V., Karypis, G., Kumar, V.: ScalParC: A new scalable and efficient parallel classification algorithm for mining large datasets. In: Proc. of the 12th International Parallel Processing Symposium, Orlando, Florida (1998) 122

    Google Scholar 

  39. Shintani, T., Kitsuregawa, M.: Mining algorithms for sequential patterns in parallel: Hash based approach. In: Research and Development in Knowledge Discovery and Data Mining: Second Pacific-Asia Conference (PAKDD’98), Melbourne, Australia (1998) 283–294 122, 122

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Joshi, M.V., Han, EH.S., Karypis, G., Kumar, V. (2002). Efficient Parallel Algorithms for Mining Associations. In: Zaki, M.J., Ho, CT. (eds) Large-Scale Parallel Data Mining. Lecture Notes in Computer Science(), vol 1759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46502-2_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-46502-2_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67194-7

  • Online ISBN: 978-3-540-46502-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics