Abstract
We propose a new algorithm for searching frequent itemsets in large data bases. The idea is to start searching from a set of representative examples instead of testing the 1-itemset,the k-itemset and so on. A clustering algorithm is firstly applied in order to cluster the transactions into k clusters. The set of the k representative examples will be used as the starting point for searching frequent itemsets. Each cluster is represented by the most representative example. We show some preliminary results and we then propose a parallel version of this algorithm based on the MapReduce Framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB, pp. 487–499 (1994)
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. SIGMOD Rec. 29(2), 1–12 (2000)
Kaufman, L., Rousseeuw, P.J.: Clustering by means of medoids. In: Dodge, Y. (ed.) Statistical Data Analysis Based on the Norm and Related Methods, pp. 405–416. North-Holland (1987)
Kosters, W.A., Pijls, W.: Apriori, a depth first implementation. In: Proc. of the Workshop on Frequent Itemset Mining Implementations (2003)
Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: Pfp: parallel fp-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender Systems, RecSys 2008, pp. 107–114. ACM, New York (2008)
Owen, S., Anil, R., Dunning, T., Friedman, E.: Mahout in Action, 1st edn. Manning Publications (January 2011)
Song, M., Rajasekaran, S.: A transaction mapping algorithm for frequent itemsets mining. IEEE Transactions on Knowledge and Data Engineering 18, 472–481 (2006)
Zhao, W., Ma, H., He, Q.: Parallel k-means clustering based on mapreduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) Cloud Computing. LNCS, vol. 5931, pp. 674–679. Springer, Heidelberg (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Malek, M., Kadima, H. (2013). Searching Frequent Itemsets by Clustering Data: Towards a Parallel Approach Using Mapreduce. In: Haller, A., Huang, G., Huang, Z., Paik, Hy., Sheng, Q.Z. (eds) Web Information Systems Engineering – WISE 2011 and 2012 Workshops. WISE WISE 2011 2012. Lecture Notes in Computer Science, vol 7652. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38333-5_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-38333-5_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38332-8
Online ISBN: 978-3-642-38333-5
eBook Packages: Computer ScienceComputer Science (R0)