Datamining in Grid Environment
The paper deals with assessing performance improvements and some implementation issues of two well-known data mining algorithms, Apriori and FP-growth, in Alchemi grid environment. We compare execution times and speed-up of two parallel implementations: pure Apriori and hybrid FP-growth — Apriori version on grid with one to six processors. As expected, the latter shows superior performances. We also discuss the effects of database characteristics on overall performance, and give directions for proper choice of execution parameters and suitable number of executors.
KeywordsAssociation Rule Frequent Itemsets Grid Environment Mining Frequent Pattern Minimal Support Threshold
Unable to display preview. Download preview PDF.
- R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases. Proc. 1993 ACM SIGMOD Int. Conf. on Management of Data, 207–216, ACM Press, 1993.Google Scholar
- M.J. Zaki et al. New algorithms for fast discovery of association rules. 3. Int. Conf. on Knowledge Discovery and Data Mining, 1997Google Scholar
- N. Pasquir, Y. Bastide, R. Taouil, L. Lakhal. Discovering frequent closed itemsets for association rules. 7. Int. Conf. on Database Theory, Jan., 1999Google Scholar
- K. Gouda, J. Zaki. Efficiently mining maximal frequent itemsets. 1. IEEE Int. Conf. on Data Mining, Nov. 2001Google Scholar
- UCI Machine Learning Database Repository, http://www.ics.uci.edu/~mlearn/MLRepository.htmlGoogle Scholar
- M. H. Dunham, Data mining, Introductory and Advanced Topics, Prentice Hall, 2003Google Scholar
- J. Han, J. Pei, Y. Yin. Mining Frequent Patterns without Candidate Generation. In ACM SIGMOD Int. Conf. on Management of Data, May, 2000Google Scholar
- E. Hong Ham, G. Karypis, V. Kumar. Scalable Parallel Data Mining for Association Rules. In IEEE Tr. on Knowledge and Data Engineering, 1999.Google Scholar
- Alchemi, http://www.alchemi.net, 2004.Google Scholar
- A. Luther, R. Buyya, R. Ranjan, S. Venugopal, Alchemi: A.NET-based Grid Computing Framework and its Integration Into Global Grids, TR, GRIDS-TR-2003-8, University of Melbourne, Australia, 2003.Google Scholar
- B. Goethals. FP-growth implementation, http:// www.cs.helsinki.fi/u/goethals/software/index.htmlGoogle Scholar
- M. Trebar, U. Lotric. Predictive data mining on rubber compound database. ICANNGA 2005, Coimbra, Portugal.Google Scholar