Skip to main content

Searching Frequent Itemsets by Clustering Data: Towards a Parallel Approach Using Mapreduce

  • Conference paper
Web Information Systems Engineering – WISE 2011 and 2012 Workshops (WISE 2011, WISE 2012)

Abstract

We propose a new algorithm for searching frequent itemsets in large data bases. The idea is to start searching from a set of representative examples instead of testing the 1-itemset,the k-itemset and so on. A clustering algorithm is firstly applied in order to cluster the transactions into k clusters. The set of the k representative examples will be used as the starting point for searching frequent itemsets. Each cluster is represented by the most representative example. We show some preliminary results and we then propose a parallel version of this algorithm based on the MapReduce Framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: VLDB, pp. 487–499 (1994)

    Google Scholar 

  2. Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)

    Google Scholar 

  3. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. SIGMOD Rec. 29(2), 1–12 (2000)

    Article  Google Scholar 

  4. Kaufman, L., Rousseeuw, P.J.: Clustering by means of medoids. In: Dodge, Y. (ed.) Statistical Data Analysis Based on the Norm and Related Methods, pp. 405–416. North-Holland (1987)

    Google Scholar 

  5. Kosters, W.A., Pijls, W.: Apriori, a depth first implementation. In: Proc. of the Workshop on Frequent Itemset Mining Implementations (2003)

    Google Scholar 

  6. Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: Pfp: parallel fp-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender Systems, RecSys 2008, pp. 107–114. ACM, New York (2008)

    Google Scholar 

  7. Owen, S., Anil, R., Dunning, T., Friedman, E.: Mahout in Action, 1st edn. Manning Publications (January 2011)

    Google Scholar 

  8. Song, M., Rajasekaran, S.: A transaction mapping algorithm for frequent itemsets mining. IEEE Transactions on Knowledge and Data Engineering 18, 472–481 (2006)

    Article  Google Scholar 

  9. Zhao, W., Ma, H., He, Q.: Parallel k-means clustering based on mapreduce. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) Cloud Computing. LNCS, vol. 5931, pp. 674–679. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Malek, M., Kadima, H. (2013). Searching Frequent Itemsets by Clustering Data: Towards a Parallel Approach Using Mapreduce. In: Haller, A., Huang, G., Huang, Z., Paik, Hy., Sheng, Q.Z. (eds) Web Information Systems Engineering – WISE 2011 and 2012 Workshops. WISE WISE 2011 2012. Lecture Notes in Computer Science, vol 7652. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38333-5_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38333-5_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38332-8

  • Online ISBN: 978-3-642-38333-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics