Frequent-Itemset Mining Using Locality-Sensitive Hashing

Bera, Debajyoti; Pratap, Rameshwar

doi:10.1007/978-3-319-42634-1_12

Debajyoti Bera¹⁵ &
Rameshwar Pratap¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9797))

Included in the following conference series:

International Computing and Combinatorics Conference

945 Accesses
6 Citations

Abstract

The Apriori algorithm is a classical algorithm for the frequent itemset mining problem. A significant bottleneck in Apriori is the number of I/O operation involved, and the number of candidates it generates. We investigate the role of LSH techniques to overcome these problems, without adding much computational overhead. We propose randomized variations of Apriori that are based on asymmetric LSH defined over Hamming distance and Jaccard similarity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that computing support is an I/O intensive operation and involves reading every transaction.

References

Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D.C., 26–28 May 1993, pp. 207–216 (1993)
Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of 20th International Conference on Very Large Data Bases, 12–15 September 1994, Santiago de Chile, Chile, pp. 487–499 (1994)
Google Scholar
Bera, D., Pratap, R.: Frequent-itemset mining using locality-sensitive hashing. CoRR, abs/1603.01682 (2016)
Google Scholar
Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. J. Comput. Syst. Sci. 60(3), 630–659 (2000)
Article MathSciNet MATH Google Scholar
Cohen, E., Datar, M., Fujiwara, S., Gionis, A., Indyk, P., Motwani, R., Ullman, J.D., Yang, C.: Finding interesting associations without support pruning. IEEE Trans. Knowl. Data Eng. 13(1), 64–78 (2001)
Article Google Scholar
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB 1999, Proceedings of 25th International Conference on Very Large Data Bases, 7–10 September 1999, Edinburgh, Scotland, UK, pp. 518–529 (1999)
Google Scholar
Gunopulos, D., Khardon, R., Mannila, H., Saluja, S., Toivonen, H., Sharma, R.S.: Discovering all most specific sentences. ACM Trans. Database Syst. 28(2), 140–174 (2003)
Article Google Scholar
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the Thirtieth Annual Symposium on the Theory of Computing, Dallas, Texas, USA, 23–26 May 1998, pp. 604–613 (1998)
Google Scholar
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. Data Min. Knowl. Discov. 1(3), 259–289 (1997)
Article Google Scholar
Pagh, R.: Locality-sensitive hashing without false negatives. In: Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2016, Arlington, VA, USA, 10–12 January 2016, pp. 1–9 (2016)
Google Scholar
Park, J.S., Chen, M., Yu, P.S.: An effective hash based algorithm for mining association rules. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, San Jose, California, 22–25 May, pp. 175–186 (1995)
Google Scholar
Shrivastava, A., Li, P.: Asymmetric minwise hashing for indexing binary inner products and set containment. In: Proceedings of the 24th International Conference on World Wide Web, 2015, Florence, Italy, 18–22 May 2015, pp. 981–991 (2015)
Google Scholar
Silverstein, C., Brin, S., Motwani, R.: Beyond market baskets: generalizing association rules to dependence rules. Data Min. Knowl. Discov. 2(1), 39–68 (1998)
Article Google Scholar
Wang, H., Wang, W., Yang, J., Yu, P.S.: Clustering by pattern similarity in large data sets. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, Wisconsin, 3–6 June 2002, pp. 394–405 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Indraprastha Institute of Information Technology-Delhi (IIIT-D), New Delhi, India
Debajyoti Bera
TCS Innovation Labs, New Delhi, India
Rameshwar Pratap

Authors

Debajyoti Bera
View author publications
You can also search for this author in PubMed Google Scholar
Rameshwar Pratap
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Debajyoti Bera .

Editor information

Editors and Affiliations

Virginia Commonwealth Univ , Richmond, Virginia, USA
Thang N. Dinh
University of Florida , Gainesville, Alabama, USA
My T. Thai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bera, D., Pratap, R. (2016). Frequent-Itemset Mining Using Locality-Sensitive Hashing. In: Dinh, T., Thai, M. (eds) Computing and Combinatorics . COCOON 2016. Lecture Notes in Computer Science(), vol 9797. Springer, Cham. https://doi.org/10.1007/978-3-319-42634-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-42634-1_12
Published: 20 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42633-4
Online ISBN: 978-3-319-42634-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics