Large-Scale Parallel Data Mining pp 83-126 | Cite as
Efficient Parallel Algorithms for Mining Associations
Abstract
The problem of mining hidden associations present in the large amounts of data has seen widespread applications in many practical domains such as customer-oriented planning and marketing, telecommunication network monitoring, and analyzing data from scientific experiments. The combinatorial complexity of the problem and phenomenal growth in the sizes of available datasets motivate the need for efficient and scalable parallel algorithms. The design of such algorithms is challenging. This chapter presents an evolutionary and comparative review of many existing representative serial and parallel algorithms for discovering two kinds of associations. The first part of the chapter is devoted to the non-sequential associations, which utilize the relationships between events that happen together. The second part is devoted to the more general and potentially more useful sequential associations, which utilize the temporal or sequential relationships between events. It is shown that many existing algorithms actually belong to a few categories which are decided by the broader design strategies. Overall the aim of the chapter is to provide a comprehensive account of the challenges and issues involved in effective parallel formulations of algorithms for discovering associations, and how various existing algorithms try to handle them.
Keywords
Association Rule Parallel Algorithm Hash Table Frequent Itemsets Count DistributionPreview
Unable to display preview. Download preview PDF.
References
- 1.Chen, M., Han, J., Yu, P.: Data mining: An overview from database perspective. IEEE Transactions on Knowledge and Data Eng. 8 (1996) 866–883 83CrossRefGoogle Scholar
- 2.Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. of 1993 ACM-SIGMOD Int. Conf. on Management of Data, Washington, D.C. (1993) 84Google Scholar
- 3.Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of the 20th VLDB Conference, Santiago, Chile (1994) 487–499 84, 87, 87, 88Google Scholar
- 4.Park, J., Chen, M., Yu, P.: An effective hash-based algorithm for mining association rules. In: Proc. of 1995 ACM-SIGMOD Int. Conf. on Management of Data. (1995) 84, 91, 91, 99Google Scholar
- 5.Savasere, A., Omiecinski, E., Navathe, S.: An efficient algorithm for mining association rules in large databases. In: Proc. of the 21st VLDB Conference, Zurich, Switzerland (1995) 432–443 84, 85, 87, 91, 92, 98Google Scholar
- 6.Mueller, A.: Fast sequential and parallel algorithms for association rule mining: A comparison. Technical Report CS-TR-3515, Dept. of Computing Science, University of Maryland, College Park, MD (1995) 84, 85, 91, 93, 93, 95, 95, 95, 100, 100Google Scholar
- 7.Toivonen, H.: Sampling large databases for association rules. In: Proc. of the 22nd VLDB Conference. (1996) 84, 91, 92, 93Google Scholar
- 8.Amir, A., Feldman, R., Kashi, R.: A new and versatile method for association generation. In Komorowski, H.J., Zytkow, J.M., eds.: Proceedings of Principles of Data Mining and Knowledge Discovery, First European Symposium (PKDD’97). Lecture Notes in Computer Science. Volume 1263. Springer, Trondheim, Norway (1997) 221–231 84, 91, 95, 96Google Scholar
- 9.Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proc. of the Third Intl Conference on Knowledge Discovery and Data Mining. (1997) 84, 91, 93, 94, 112Google Scholar
- 10.Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proc. of 1997 ACM-SIGMOD Int. Conf. on Management of Data, Tucson, Arizona (1997) 255–264 84, 91, 93, 94Google Scholar
- 11.Agarwal, R.C., Aggarwal, C., Prasad, V.V.V.: A tree projection algorithm for generation of frequent item-sets. Journal of Parallel and Distributed Computing (Special Issue on High Performance Data Mining) (2000) 84, 85, 91, 91, 100, 107Google Scholar
- 12.Agarwal, R.C., Aggarwal, C., Prasad, V.V.V.: Depth-first generation of large itemsets for association rules. Technical Report RC-21538, IBM Research Division (1999) 84, 91, 91, 93, 95Google Scholar
- 13.Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. Technical Report CMPT99-12, School of Computing Science, Simon Fraser University (1999) 84, 91, 95, 95Google Scholar
- 14.Agrawal, R., Shafer, J.: Parallel mining of association rules: Design, implementation and experience. Technical Report RJ10004, IBM Research Division, Almaden Research Center (1996) 85, 113Google Scholar
- 15.Han, E., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. IEEE Transactions on Knowledge and Data Eng. (1999) 85, 90, 98, 103, 103, 106, 110Google Scholar
- 16.Park, J., Chen, M., Yu, P.: Efficient parallel data mining for association rules. In: Proceedings of the 4th Intl Conf. on Information and Knowledge Management. (1995) 85, 99Google Scholar
- 17.Shintani, T., Kitsuregawa, M.: Hash based parallel algorithms for mining association rules. In: Proc. of the Conference on Parallel and Distributed Information Systems. (1996) 85, 100, 101, 106, 110, 110, 119, 122Google Scholar
- 18.Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New parallel algorithms for fast discovery of association rules. Data Mining and Knowledge Discovery: An International Journal 1 (1997) 85, 97, 110, 112Google Scholar
- 19.Cheung, D., Ng, V., Fu, A., Fu, Y.: Efficient mining of association rules in distributed databases. IEEE Transactions on Knowledge and Data Eng. 8 (1996) 911–922 85, 110, 110CrossRefGoogle Scholar
- 20.Cheung, D., Han, J., Ng, V.T., nd Y. Fu, A.W.F.: A fast distributed algorithm for mining association rules. In: Proc. of 1996 International Conference on Parallel and Distributed Information Systems (PDIS’96), Miami Beach (1996) 85, 111, 111Google Scholar
- 21.Cheung, D., Xiao, Y.: Effect of data skewness in parallel mining of association rules. In: Research and Development in Knowledge Discovery and Data Mining: Second Pacific-Asia Conference (PAKDD’98), Melbourne, Australia (1998) 85, 112Google Scholar
- 22.Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proc. of the Intl Conference on Data Engineering (ICDE), Taipei, Taiwan (1996) 85, 85, 86, 118Google Scholar
- 23.Mannila, H., Toivonen, H., Verkamo, A.I.: Discovering frequent episodes in sequences. In: Proc. of the First Intl Conference on Knowledge Discovery and Data Mining, Montreal, Quebec (1995) 210–215 85, 85, 86Google Scholar
- 24.Joshi, M.V., Karypis, G., Kumar, V.: Universal formulation of sequential patterns. Technical Report TR 99-021, Department of Computer Science, University of Minnesota, Minneapolis (1999) 85, 86, 114, 115, 116, 117, 117, 118, 118Google Scholar
- 25.Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Proc. of the Fifth Intl Conference on Extending Database Technology, Avignon, France (1996) 86, 117, 117, 118, 119, 122Google Scholar
- 26.Bettini, C., Wang, X.S., Jajodia, S.: Testing complex temporal relationships involving multiple granularities and its application to data mining. In: Proc. of ACM PODS’96, Montreal (1996) 68–78 86, 117, 117Google Scholar
- 27.Houtsma, M.A.W., Swami, A.N.: Set-oriented mining for association rules in relational databases. In: Proc. of the 11th Intl Conf. on Data Eng., Taipei, Taiwan (1995) 25–33 87Google Scholar
- 28.Zaki, M.J.: Parallel and distributed association mining: A survey. IEEE Concurrency (Special Issue on Data Mining) (1999) 93, 97, 97, 112Google Scholar
- 29.Sedgewick, R.: Algorithms. Second edn. Addison-Wesley (1988) 96Google Scholar
- 30.Agrawal, R., Shafer, J.: Parallel mining of association rules. IEEE Transactions on Knowledge and Data Eng. 8 (1996) 962–969 98, 98, 101, 101, 102CrossRefGoogle Scholar
- 31.Kumar, V., Grama, A., Gupta, A., Karypis, G.: Introduction to Parallel Computing: Algorithm Design and Analysis. Benjamin Cummings/ Addison Wesley, Redwod City (1994) 98, 101, 103MATHGoogle Scholar
- 32.Han, E., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. In: Proc. of 1997 ACM-SIGMOD Int. Conf. on Management of Data, Tucson, Arizona (1997) 103, 106, 110, 122Google Scholar
- 33.Papadimitriou, C.H., Steiglitz, K.: Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Englewood Cliffs, NJ (1982) 105MATHGoogle Scholar
- 34.Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Technical Report C-1997-15, Department of Computer Science, University of Helsinki, Finland (1997) 117, 117Google Scholar
- 35.Garofalakis, M.N., Rastogi, R., Shim, K.: SPIRIT: Sequential pattern mining with regular expression constraints. In: Proc. of the 25th VLDB Conference, Edinburgh, Scotland (1999) 223–234 117, 117, 118Google Scholar
- 36.Zaki, M.J.: Efficient enumeration of frequent sequences. In: Proc. of 7th International Conference on Information and Knowledge Management (CIKM’98), Washington DC (1998) 68–75 118Google Scholar
- 37.Joshi, M.V., Karypis, G., Kumar, V.: Parallel algorithms for mining sequential associations: Issues and challenges. Technical Report under preparation, Department of Computer Science, University of Minnesota, Minneapolis (1999) 119, 121, 121, 121, 121, 122Google Scholar
- 38.Joshi, M.V., Karypis, G., Kumar, V.: ScalParC: A new scalable and efficient parallel classification algorithm for mining large datasets. In: Proc. of the 12th International Parallel Processing Symposium, Orlando, Florida (1998) 122Google Scholar
- 39.Shintani, T., Kitsuregawa, M.: Mining algorithms for sequential patterns in parallel: Hash based approach. In: Research and Development in Knowledge Discovery and Data Mining: Second Pacific-Asia Conference (PAKDD’98), Melbourne, Australia (1998) 283–294 122, 122Google Scholar