Abstract
The problem of mining hidden associations present in the large amounts of data has seen widespread applications in many practical domains such as customer-oriented planning and marketing, telecommunication network monitoring, and analyzing data from scientific experiments. The combinatorial complexity of the problem and phenomenal growth in the sizes of available datasets motivate the need for efficient and scalable parallel algorithms. The design of such algorithms is challenging. This chapter presents an evolutionary and comparative review of many existing representative serial and parallel algorithms for discovering two kinds of associations. The first part of the chapter is devoted to the non-sequential associations, which utilize the relationships between events that happen together. The second part is devoted to the more general and potentially more useful sequential associations, which utilize the temporal or sequential relationships between events. It is shown that many existing algorithms actually belong to a few categories which are decided by the broader design strategies. Overall the aim of the chapter is to provide a comprehensive account of the challenges and issues involved in effective parallel formulations of algorithms for discovering associations, and how various existing algorithms try to handle them.
This work was supported by NSF grant ACI-9982274, by Army High Performance Computing Research Center cooperative agreement number DAAH04-95-2-0003/contract number DAAH04-95-C-0008, the content of which does not necessarily reflect the position or the policy of the government, and no official endorsement should be inferred. Access to computing facilities was provided by AHPCRC, Minnesota Supercomputer Institute. Related papers are available via WWW at URL: http://www.cs.umn.edu/~kumar.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chen, M., Han, J., Yu, P.: Data mining: An overview from database perspective. IEEE Transactions on Knowledge and Data Eng. 8 (1996) 866–883 83
Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proc. of 1993 ACM-SIGMOD Int. Conf. on Management of Data, Washington, D.C. (1993) 84
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of the 20th VLDB Conference, Santiago, Chile (1994) 487–499 84, 87, 87, 88
Park, J., Chen, M., Yu, P.: An effective hash-based algorithm for mining association rules. In: Proc. of 1995 ACM-SIGMOD Int. Conf. on Management of Data. (1995) 84, 91, 91, 99
Savasere, A., Omiecinski, E., Navathe, S.: An efficient algorithm for mining association rules in large databases. In: Proc. of the 21st VLDB Conference, Zurich, Switzerland (1995) 432–443 84, 85, 87, 91, 92, 98
Mueller, A.: Fast sequential and parallel algorithms for association rule mining: A comparison. Technical Report CS-TR-3515, Dept. of Computing Science, University of Maryland, College Park, MD (1995) 84, 85, 91, 93, 93, 95, 95, 95, 100, 100
Toivonen, H.: Sampling large databases for association rules. In: Proc. of the 22nd VLDB Conference. (1996) 84, 91, 92, 93
Amir, A., Feldman, R., Kashi, R.: A new and versatile method for association generation. In Komorowski, H.J., Zytkow, J.M., eds.: Proceedings of Principles of Data Mining and Knowledge Discovery, First European Symposium (PKDD’97). Lecture Notes in Computer Science. Volume 1263. Springer, Trondheim, Norway (1997) 221–231 84, 91, 95, 96
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: Proc. of the Third Intl Conference on Knowledge Discovery and Data Mining. (1997) 84, 91, 93, 94, 112
Brin, S., Motwani, R., Ullman, J.D., Tsur, S.: Dynamic itemset counting and implication rules for market basket data. In: Proc. of 1997 ACM-SIGMOD Int. Conf. on Management of Data, Tucson, Arizona (1997) 255–264 84, 91, 93, 94
Agarwal, R.C., Aggarwal, C., Prasad, V.V.V.: A tree projection algorithm for generation of frequent item-sets. Journal of Parallel and Distributed Computing (Special Issue on High Performance Data Mining) (2000) 84, 85, 91, 91, 100, 107
Agarwal, R.C., Aggarwal, C., Prasad, V.V.V.: Depth-first generation of large itemsets for association rules. Technical Report RC-21538, IBM Research Division (1999) 84, 91, 91, 93, 95
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. Technical Report CMPT99-12, School of Computing Science, Simon Fraser University (1999) 84, 91, 95, 95
Agrawal, R., Shafer, J.: Parallel mining of association rules: Design, implementation and experience. Technical Report RJ10004, IBM Research Division, Almaden Research Center (1996) 85, 113
Han, E., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. IEEE Transactions on Knowledge and Data Eng. (1999) 85, 90, 98, 103, 103, 106, 110
Park, J., Chen, M., Yu, P.: Efficient parallel data mining for association rules. In: Proceedings of the 4th Intl Conf. on Information and Knowledge Management. (1995) 85, 99
Shintani, T., Kitsuregawa, M.: Hash based parallel algorithms for mining association rules. In: Proc. of the Conference on Parallel and Distributed Information Systems. (1996) 85, 100, 101, 106, 110, 110, 119, 122
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New parallel algorithms for fast discovery of association rules. Data Mining and Knowledge Discovery: An International Journal 1 (1997) 85, 97, 110, 112
Cheung, D., Ng, V., Fu, A., Fu, Y.: Efficient mining of association rules in distributed databases. IEEE Transactions on Knowledge and Data Eng. 8 (1996) 911–922 85, 110, 110
Cheung, D., Han, J., Ng, V.T., nd Y. Fu, A.W.F.: A fast distributed algorithm for mining association rules. In: Proc. of 1996 International Conference on Parallel and Distributed Information Systems (PDIS’96), Miami Beach (1996) 85, 111, 111
Cheung, D., Xiao, Y.: Effect of data skewness in parallel mining of association rules. In: Research and Development in Knowledge Discovery and Data Mining: Second Pacific-Asia Conference (PAKDD’98), Melbourne, Australia (1998) 85, 112
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proc. of the Intl Conference on Data Engineering (ICDE), Taipei, Taiwan (1996) 85, 85, 86, 118
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovering frequent episodes in sequences. In: Proc. of the First Intl Conference on Knowledge Discovery and Data Mining, Montreal, Quebec (1995) 210–215 85, 85, 86
Joshi, M.V., Karypis, G., Kumar, V.: Universal formulation of sequential patterns. Technical Report TR 99-021, Department of Computer Science, University of Minnesota, Minneapolis (1999) 85, 86, 114, 115, 116, 117, 117, 118, 118
Srikant, R., Agrawal, R.: Mining sequential patterns: Generalizations and performance improvements. In: Proc. of the Fifth Intl Conference on Extending Database Technology, Avignon, France (1996) 86, 117, 117, 118, 119, 122
Bettini, C., Wang, X.S., Jajodia, S.: Testing complex temporal relationships involving multiple granularities and its application to data mining. In: Proc. of ACM PODS’96, Montreal (1996) 68–78 86, 117, 117
Houtsma, M.A.W., Swami, A.N.: Set-oriented mining for association rules in relational databases. In: Proc. of the 11th Intl Conf. on Data Eng., Taipei, Taiwan (1995) 25–33 87
Zaki, M.J.: Parallel and distributed association mining: A survey. IEEE Concurrency (Special Issue on Data Mining) (1999) 93, 97, 97, 112
Sedgewick, R.: Algorithms. Second edn. Addison-Wesley (1988) 96
Agrawal, R., Shafer, J.: Parallel mining of association rules. IEEE Transactions on Knowledge and Data Eng. 8 (1996) 962–969 98, 98, 101, 101, 102
Kumar, V., Grama, A., Gupta, A., Karypis, G.: Introduction to Parallel Computing: Algorithm Design and Analysis. Benjamin Cummings/ Addison Wesley, Redwod City (1994) 98, 101, 103
Han, E., Karypis, G., Kumar, V.: Scalable parallel data mining for association rules. In: Proc. of 1997 ACM-SIGMOD Int. Conf. on Management of Data, Tucson, Arizona (1997) 103, 106, 110, 122
Papadimitriou, C.H., Steiglitz, K.: Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall, Englewood Cliffs, NJ (1982) 105
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Technical Report C-1997-15, Department of Computer Science, University of Helsinki, Finland (1997) 117, 117
Garofalakis, M.N., Rastogi, R., Shim, K.: SPIRIT: Sequential pattern mining with regular expression constraints. In: Proc. of the 25th VLDB Conference, Edinburgh, Scotland (1999) 223–234 117, 117, 118
Zaki, M.J.: Efficient enumeration of frequent sequences. In: Proc. of 7th International Conference on Information and Knowledge Management (CIKM’98), Washington DC (1998) 68–75 118
Joshi, M.V., Karypis, G., Kumar, V.: Parallel algorithms for mining sequential associations: Issues and challenges. Technical Report under preparation, Department of Computer Science, University of Minnesota, Minneapolis (1999) 119, 121, 121, 121, 121, 122
Joshi, M.V., Karypis, G., Kumar, V.: ScalParC: A new scalable and efficient parallel classification algorithm for mining large datasets. In: Proc. of the 12th International Parallel Processing Symposium, Orlando, Florida (1998) 122
Shintani, T., Kitsuregawa, M.: Mining algorithms for sequential patterns in parallel: Hash based approach. In: Research and Development in Knowledge Discovery and Data Mining: Second Pacific-Asia Conference (PAKDD’98), Melbourne, Australia (1998) 283–294 122, 122
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Joshi, M.V., Han, EH.S., Karypis, G., Kumar, V. (2002). Efficient Parallel Algorithms for Mining Associations. In: Zaki, M.J., Ho, CT. (eds) Large-Scale Parallel Data Mining. Lecture Notes in Computer Science(), vol 1759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46502-2_5
Download citation
DOI: https://doi.org/10.1007/3-540-46502-2_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67194-7
Online ISBN: 978-3-540-46502-7
eBook Packages: Springer Book Archive