Skip to main content

Frequent-Itemset Mining Using Locality-Sensitive Hashing

  • Conference paper
  • First Online:
Computing and Combinatorics (COCOON 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9797))

Included in the following conference series:

Abstract

The Apriori algorithm is a classical algorithm for the frequent itemset mining problem. A significant bottleneck in Apriori is the number of I/O operation involved, and the number of candidates it generates. We investigate the role of LSH techniques to overcome these problems, without adding much computational overhead. We propose randomized variations of Apriori that are based on asymmetric LSH defined over Hamming distance and Jaccard similarity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that computing support is an I/O intensive operation and involves reading every transaction.

References

  1. Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., 26–28 May 1993, pp. 207–216 (1993)

    Google Scholar 

  2. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of 20th International Conference on Very Large Data Bases, 12–15 September 1994, Santiago de Chile, Chile, pp. 487–499 (1994)

    Google Scholar 

  3. Bera, D., Pratap, R.: Frequent-itemset mining using locality-sensitive hashing. CoRR, abs/1603.01682 (2016)

    Google Scholar 

  4. Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. J. Comput. Syst. Sci. 60(3), 630–659 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  5. Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J.D., Yang, C.: Finding interesting associations without support pruning. IEEE Trans. Knowl. Data Eng. 13(1), 64–78 (2001)

    Article  Google Scholar 

  6. Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB 1999, Proceedings of 25th International Conference on Very Large Data Bases, 7–10 September 1999, Edinburgh, Scotland, UK, pp. 518–529 (1999)

    Google Scholar 

  7. Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., Sharma, R.S.: Discovering all most specific sentences. ACM Trans. Database Syst. 28(2), 140–174 (2003)

    Article  Google Scholar 

  8. Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual Symposium on the Theory of Computing, Dallas, Texas, USA, 23–26 May 1998, pp. 604–613 (1998)

    Google Scholar 

  9. Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov. 1(3), 259–289 (1997)

    Article  Google Scholar 

  10. Pagh, R.: Locality-sensitive hashing without false negatives. In: Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, 10–12 January 2016, pp. 1–9 (2016)

    Google Scholar 

  11. Park, J.S., Chen, M., Yu, P.S.: An effective hash based algorithm for mining association rules. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, San Jose, California, 22–25 May, pp. 175–186 (1995)

    Google Scholar 

  12. Shrivastava, A., Li, P.: Asymmetric minwise hashing for indexing binary inner products and set containment. In: Proceedings of the 24th International Conference on World Wide Web, 2015, Florence, Italy, 18–22 May 2015, pp. 981–991 (2015)

    Google Scholar 

  13. Silverstein, C., Brin, S., Motwani, R.: Beyond market baskets: generalizing association rules to dependence rules. Data Min. Knowl. Discov. 2(1), 39–68 (1998)

    Article  Google Scholar 

  14. Wang, H., Wang, W., Yang, J., Yu, P.S.: Clustering by pattern similarity in large data sets. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, 3–6 June 2002, pp. 394–405 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Debajyoti Bera .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Bera, D., Pratap, R. (2016). Frequent-Itemset Mining Using Locality-Sensitive Hashing. In: Dinh, T., Thai, M. (eds) Computing and Combinatorics . COCOON 2016. Lecture Notes in Computer Science(), vol 9797. Springer, Cham. https://doi.org/10.1007/978-3-319-42634-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42634-1_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42633-4

  • Online ISBN: 978-3-319-42634-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics