Searching Frequent Itemsets by Clustering Data: Towards a Parallel Approach Using Mapreduce

Malek, Maria; Kadima, Hubert

doi:10.1007/978-3-642-38333-5_26

Maria Malek²¹ &
Hubert Kadima²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7652))

Included in the following conference series:

1178 Accesses
7 Citations

Abstract

We propose a new algorithm for searching frequent itemsets in large data bases. The idea is to start searching from a set of representative examples instead of testing the 1-itemset,the k-itemset and so on. A clustering algorithm is firstly applied in order to cluster the transactions into k clusters. The set of the k representative examples will be used as the starting point for searching frequent itemsets. Each cluster is represented by the most representative example. We show some preliminary results and we then propose a parallel version of this algorithm based on the MapReduce Framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB, pp. 487–499 (1994)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. SIGMOD Rec. 29(2), 1–12 (2000)
Article Google Scholar
Kaufman, L., Rousseeuw, P.J.: Clustering by means of medoids. In: Dodge, Y. (ed.) Statistical Data Analysis Based on the Norm and Related Methods, pp. 405–416. North-Holland (1987)
Google Scholar
Kosters, W.A., Pijls, W.: Apriori, a depth first implementation. In: Proc. of the Workshop on Frequent Itemset Mining Implementations (2003)
Google Scholar
Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: Pfp: parallel fp-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender Systems, RecSys 2008, pp. 107–114. ACM, New York (2008)
Google Scholar
Owen, S., Anil, R., Dunning, T., Friedman, E.: Mahout in Action, 1st edn. Manning Publications (January 2011)
Google Scholar
Song, M., Rajasekaran, S.: A transaction mapping algorithm for frequent itemsets mining. IEEE Transactions on Knowledge and Data Engineering 18, 472–481 (2006)
Article Google Scholar
Zhao, W., Ma, H., He, Q.: Parallel k-means clustering based on mapreduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) Cloud Computing. LNCS, vol. 5931, pp. 674–679. Springer, Heidelberg (2009)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

EISTI-LARIS Laboratory, Ave du Parc, 95011, Cergy-Pontoise, France
Maria Malek & Hubert Kadima

Authors

Maria Malek
View author publications
You can also search for this author in PubMed Google Scholar
Hubert Kadima
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Information Engineering Laboratory, CSIRO ICT Centre, Australia
Armin Haller
Victoria University, Melbourne, Australia
Guangyan Huang
Department of Computer Science, Vrije University, Amsterdam, The Netherlands
Zhisheng Huang
The University of New South Wales, Sydney, NSW, Australia
Hye-young Paik
Department of Computer Science, Adelaide University, 5005, Adelaide, SA, Australia
Quan Z. Sheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Malek, M., Kadima, H. (2013). Searching Frequent Itemsets by Clustering Data: Towards a Parallel Approach Using Mapreduce. In: Haller, A., Huang, G., Huang, Z., Paik, Hy., Sheng, Q.Z. (eds) Web Information Systems Engineering – WISE 2011 and 2012 Workshops. WISE WISE 2011 2012. Lecture Notes in Computer Science, vol 7652. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38333-5_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-38333-5_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38332-8
Online ISBN: 978-3-642-38333-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics