Improved Counter Based Algorithms for Frequent Pairs Mining in Transactional Data Streams

Kutzkov, Konstantin

doi:10.1007/978-3-642-33460-3_59

Konstantin Kutzkov²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7523))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

4588 Accesses
2 Citations

Abstract

A straightforward approach to frequent pairs mining in transactional streams is to generate all pairs occurring in transactions and apply a frequent items mining algorithm to the resulting stream. The well-known counter based algorithms Frequent and Space-Saving are known to achieve a very good approximation when the frequencies of the items in the stream adhere to a skewed distribution.

Motivated by observations on real datasets, we present a general technique for applying Frequent and Space-Saving to transactional data streams for the case when the transactions considerably vary in their lengths. Despite of its simplicity, we show through extensive experiments that our approach is considerably more efficient and precise than the naïve application of Frequent and Space-Saving.

Download to read the full chapter text

Chapter PDF

Study of Effective Mining Algorithms for Frequent Itemsets

A Novel Algorithm for Frequent Itemsets Mining in Transactional Databases

Mining Discriminative Itemsets Over Data Streams Using Efficient Sliding Window

Article Open access 27 June 2023

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules in Large Databases. In: VLDB 1994, pp. 487–499 (1994)
Google Scholar
Amossen, R.R., Campagna, A., Pagh, R.: Better Size Estimation for Sparse Matrix Products. In: Serna, M., Shaltiel, R., Jansen, K., Rolim, J. (eds.) APPROX 2010, LNCS, vol. 6302, pp. 406–419. Springer, Heidelberg (2010)
Google Scholar
Bar-Yossef, Z., Jayram, T.S., Kumar, R., Sivakumar, D., Trevisan, L.: Counting Distinct Elements in a Data Stream. In: Rolim, J.D.P., Vadhan, S.P. (eds.) RANDOM 2002. LNCS, vol. 2483, pp. 1–10. Springer, Heidelberg (2002)
Chapter Google Scholar
Berinde, R., Indyk, P., Cormode, G., Strauss, M.J.: Space-optimal heavy hitters with strong error bounds. ACM Trans. Database Syst. 35(4), 26 (2010)
Article Google Scholar
Bose, P., Kranakis, E., Morin, P., Tang, Y.: Bounds for Frequency Estimation of Packet Streams. In: SIROCCO 2003, pp. 33–42 (2003)
Google Scholar
Campagna, A., Kutzkov, K., Pagh, R.: Frequent Pairs in Data Streams: Exploiting Parallelism and Skew. In: ICDM Workshops 2011, pp. 145–150 (2011)
Google Scholar
Campagna, A., Pagh, R.: Finding Associations and Computing Similarity via Biased Pair Sampling. In: ICDM 2009, pp. 61–70 (2009)
Google Scholar
Campagna, A., Pagh, R.: On Finding Similar Items in a Stream of Transactions. In: ICDM Workshops 2010, pp. 121–128 (2010)
Google Scholar
Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. Theor. Comput. Sci 312(1), 3–15 (2004)
Article MathSciNet MATH Google Scholar
Cormode, G., Hadjieleftheriou, M.: Finding the frequent items in streams of data. ACM Commun. 52(10), 97–105 (2009)
Article Google Scholar
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. J. Algorithms 55(1), 58–75 (2005)
Article MathSciNet MATH Google Scholar
Cormode, G., Muthukrishnan, S.: Summarizing and Mining Skewed Data Streams. In: SDM 2005 (2005)
Google Scholar
Demaine, E.D., López-Ortiz, A., Munro, J.I.: Frequency Estimation of Internet Packet Streams with Limited Space. In: Möhring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 348–360. Springer, Heidelberg (2002)
Chapter Google Scholar
Han, J., Pei, J., Yin, Y., Mao, R.: Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. Data Min. Knowl. Discov. 8(1), 53–87 (2004)
Article MathSciNet Google Scholar
Jiang, N., Gruenwald, L.: Research issues in data stream association rule mining. SIGMOD Record 35(1), 14–19 (2006)
Article Google Scholar
Jin, R., Agrawal, G.: An Algorithm for In-Core Frequent Itemset Mining on Streaming Data. In: ICDM 2005, pp. 210–217 (2005)
Google Scholar
Karp, R.M., Shenker, S., Papadimitriou, C.H.: A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28, 51–55 (2003)
Article Google Scholar
Kohavi, R., Brodley, C.E., Frasca, B., Mason, L., Zheng, Z.: KDD-Cup 2000 Organizers’ Report: Peeling the Onion. SIGKDD Explorations 2(2), 86–98 (2000)
Article Google Scholar
Lee, L.K., Ting, H.F.: A simpler and more efficient deterministic scheme for finding frequent items over sliding windows. In: PODS 2006, pp. 290–297 (2006)
Google Scholar
Leskovec, J., Huttenlocher, D., Kleinberg, J.: Signed Networks in Social Media. In: CHI 2010 (2010)
Google Scholar
Leskovec, J., Huttenlocher, D., Kleinberg, J.: Predicting Positive and Negative Links in Online Social Networks. In: WWW 2010 (2010)
Google Scholar
Leskovec, J., Lang, K., Dasgupta, A., Mahoney, M.: Community Structure in Large Networks. Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Mathematics 6(1), 29–123 (2009)
Article MathSciNet MATH Google Scholar
Leskovec, J., Kleinberg, J., Faloutsos, C.: Graph Evolution. Densification and Shrinking Diameters. ACM TKDD 1(1) (2007)
Google Scholar
Lucchese, C., Orlando, S., Perego, R., Silvestri, F.: WebDocs: a real-life huge transactional dataset. In: FIMI 2004 (2004)
Google Scholar
Manku, G.S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: VLDB 2002, pp. 346–357 (2007)
Google Scholar
Metwally, A., Agrawal, D., El Abbadi, A.: An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans. Database Syst. 31(3), 1095–1133 (2006)
Article Google Scholar
Misra, J., Gries, D.: Finding Repeated Elements. Sci. Comput. Program. 2(2), 143–152 (1982)
Article MathSciNet MATH Google Scholar
Park, J.S., Chen, M.-S., Yu, P.S.: Using a Hash-Based Method with Transaction Trimming for Mining Association Rules. IEEE TKDE 9(5), 813–825 (1997)
Google Scholar
Richardson, M., Agrawal, R., Domingos, P.: Trust Management for the Semantic Web. In: Fensel, D., Sycara, K., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 351–368. Springer, Heidelberg (2003)
Chapter Google Scholar
Yu, J.X., Chong, Z., Lu, H., Zhang, Z., Zhou, A.: A false negative approach to mining frequent itemsets from high speed transactional data streams. Inf. Sci. 176(14), 1986–2015 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

IT University of Copenhagen, Denmark
Konstantin Kutzkov

Authors

Konstantin Kutzkov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Intelligent Systems Laboratory, University of Bristol, Merchant Venturers Building, Woodland Road, BS8 1UB, Bristol, UK
Peter A. Flach , Tijl De Bie & Nello Cristianini , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kutzkov, K. (2012). Improved Counter Based Algorithms for Frequent Pairs Mining in Transactional Data Streams. In: Flach, P.A., De Bie, T., Cristianini, N. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2012. Lecture Notes in Computer Science(), vol 7523. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33460-3_59

Download citation

DOI: https://doi.org/10.1007/978-3-642-33460-3_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33459-7
Online ISBN: 978-3-642-33460-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improved Counter Based Algorithms for Frequent Pairs Mining in Transactional Data Streams

Abstract

Chapter PDF

Similar content being viewed by others

Study of Effective Mining Algorithms for Frequent Itemsets

A Novel Algorithm for Frequent Itemsets Mining in Transactional Databases

Mining Discriminative Itemsets Over Data Streams Using Efficient Sliding Window

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Improved Counter Based Algorithms for Frequent Pairs Mining in Transactional Data Streams

Abstract

Chapter PDF

Similar content being viewed by others

Study of Effective Mining Algorithms for Frequent Itemsets

A Novel Algorithm for Frequent Itemsets Mining in Transactional Databases

Mining Discriminative Itemsets Over Data Streams Using Efficient Sliding Window

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation