Novel structures for counting frequent items in time decayed streams

Wu, Shanshan; Lin, Huaizhong; U, Leong Hou; Gao, Yunjun; Lu, Dongming

doi:10.1007/s11280-017-0433-5

Novel structures for counting frequent items in time decayed streams

Published: 24 January 2017

Volume 20, pages 1111–1133, (2017)
Cite this article

World Wide Web Aims and scope Submit manuscript

Shanshan Wu¹,
Huaizhong Lin¹,
Leong Hou U²,
Yunjun Gao¹ &
…
Dongming Lu¹

534 Accesses
11 Citations
Explore all metrics

Abstract

Identifying frequently occurring items is a fundamental building block in many data stream applications. A great deal of work for efficiently identifying frequent items has been studied on the landmark and sliding window models. In this work, we revisit this problem on a new streaming model based on the time decay, where the importance of every arrival item is decreased over the time. To address the importance changes over time, we propose an innovative heap structure, named Quasi-heap, which maintains the item order using a lazy update mechanism. Two approximation algorithm, Space Saving with Quasi-heap (SSQ) and Filtered Space Saving with Quasi-heap (FSSQ), are proposed to find the frequently occurring items based on the Quasi-heap structure. To achieve better accuracy of frequency estimation for all the items in the stream, we introduce a new count-min-min (CMM) sketch structure, which can estimate the count of an item with almost error free. Extensive experiments conducted on both real-world and synthetic data demonstrate the superiority of proposed methods in terms of both efficiency (i.e., response time) and effectiveness (i.e., accuracy).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Finding Frequent Items in Time Decayed Data Streams

Mining Long Patterns of Least-Support Items in Stream

How to Catch L 2-Heavy-Hitters on Sliding Windows

Notes

Frequent Itemset Mining Dataset Repository, available at http://fimi.cs.helsinki.fi/data/ (last accessed on 17 November, 2016)

References

Aouad, L. M., Le-Khac, N. A., Kechadi, T. M.: Performance study of distributed apriori-like frequent itemsets mining. Knowl. Inf. Syst. 23(1), 55–72 (2010)
Article Google Scholar
Boley, M., Grosskreutz, H.: Approximating the number of frequent sets in dense data. Knowl. Inf. Syst. 21(1), 65–89 (2009)
Article Google Scholar
Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: a case study. In: SIGKDD, pp. 254–260. ACM (1999)
Chakrabarti, A., Cormode, G., McGregor, A.: A near-optimal algorithm for computing the entropy of a stream. In: ACM-SIAM Symposium on Discrete Algorithms, pp. 328–335. Society for Industrial and Applied Mathematics (2007)
Chang, J. H., Lee, W. S.: Finding recent frequent itemsets adaptively over online data streams. In: SIGKDD, pp. 487–492. ACM (2003)
Charikar, M., Chen, K., Farach-Colton, M.: Finding frequent items in data streams. In: Automata, Languages and Programming, pp. 693–703. Springer (2002)
Chen, L., Mei, Q.: Mining frequent items in data stream using time fading model. Inform. Sci. 257, 54–69 (2014)
Article MathSciNet MATH Google Scholar
Chen, L., Zhang, S., Tu, L.: An algorithm for mining frequent items on data stream using fading factor. In: COMPSAC, vol. 2, pp. 172–177. IEEE (2009)
Chen, L., Zou, L. J., Tu, L.: A clustering algorithm for multiple data streams based on spectral component similarity. Inform. Sci. 183(1), 35–47 (2012)
Article Google Scholar
Cormode, G., Hadjieleftheriou, M.: Finding the frequent items in streams of data. Commun. ACM 52(10), 97–105 (2009)
Article Google Scholar
Cormode, G., Muthukrishnan, S.: An improved data stream summary: the count-min sketch and its applications. Journal of Algorithms 55(1), 58–75 (2005)
Article MathSciNet MATH Google Scholar
Cormode, G., Muthukrishnan, S.: What’s hot and what’s not: tracking most frequent items dynamically. ACM Trans. Database Syst. 30(1), 249–278 (2005)
Article Google Scholar
Cormode, G., Shkapenyuk, V., Srivastava, D., Xu, B.: Forward decay: a practical time decay model for streaming systems. In: ICDE, pp. 138–149. IEEE (2009)
Floyd, R. W.: Algorithm 245: Treesort. Commun. ACM 7(12), 701 (1964)
Article Google Scholar
Golab, L., DeHaan, D., Demaine, E. D., Lopez-Ortiz, A., Munro, J. I.: Identifying frequent items in sliding windows over on-line packet streams. In: SIGCOMM, pp. 173–178. ACM (2003)
Homem, N., Carvalho, J. P.: Finding top-k elements in data streams. Inform. Sci. 180(24), 4958–4974 (2010)
Article Google Scholar
Jin, C., Qian, W., Sha, C., Yu, J. X., Zhou, A.: Dynamically maintaining frequent items over a data stream. In: CIKM, pp. 287–294. ACM (2003)
Karp, R. M., Shenker, S., Papadimitriou, C. H.: A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst. 28(1), 51–55 (2003)
Article Google Scholar
Li, H. F., Huang, H. Y., Lee, S. Y.: Fast and memory efficient mining of high-utility itemsets from data streams: with and without negative item profits. Knowl. Inf. Syst. 28(3), 495–522 (2011)
Article Google Scholar
Lim, Y., Choi, J., Kang, U.: Fast, accurate, and space-efficient tracking of time-weighted frequent items from data streams. In: CIKM, pp. 1109–1118. ACM (2014)
Lin, Z., Jiang, B., Pei, J., Jiang, D.: Mining discriminative items in multiple data streams. World Wide Web Journal 13(4), 497–522 (2010)
Article Google Scholar
Manerikar, N., Palpanas, T.: Frequent items in streaming data: an experimental evaluation of the state-of-the-art. Data Knowl. Eng. 68(4), 415–430 (2009)
Article Google Scholar
Manku, G. S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: VLDB, pp. 346–357. VLDB Endowment (2002)
Mei, Q. L., Chen, L.: An algorithm for mining frequent stream data items using hash function and fading factor. In: Applied Mechanics and Materials, vol. 130, pp. 2661–2665. Trans Tech Publ (2012)
Metwally, A., Agrawal, D., Abbadi, A. E.: An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans. Database Syst. 31(3), 1095–1133 (2006)
Article Google Scholar
Shaker, A., Senge, R., Hüllermeier, E.: Evolving fuzzy pattern trees for binary classification on data streams. Inform. Sci. 220, 34–45 (2013)
Article Google Scholar
Tantono, F. I., Manerikar, N., Palpanas, T.: Efficiently discovering recent frequent items in data streams. In: Scientific and Statistical Database Management, pp. 222–239. Springer (2008)
Tong, Y., Zhang, X., Chen, L.: Tracking frequent items over distributed probabilistic data. World Wide Web Journal, 1–26 (2015)
Wei, Z., Liu, X., Li, F., Shang, S., Du, X., Wen, J.: Matrix sketching over sliding windows. In: SIGMOD, pp. 1465–1480 (2016)
Woo, H. J., Lee, W. S.: Estmax: Tracing maximal frequent item sets instantly over online transactional data streams. IEEE Trans. Knowl. Data Eng. 21(10), 1418–1431 (2009)
Article Google Scholar
Wu, S., Lin, H., U, L.H., Gao, Y., Lu, D.: Finding frequent items in time decayed data streams. In: Apweb, pp. 17–29 (2016)
Zhang, S., Chen, L., Tu, L.: Frequent items mining on data stream based on time fading factor. In: AICI, vol. 4, pp. 336–340. IEEE (2009)
Zhang, S., Chen, L., Tu, L.: Frequent items mining on data stream using hash-table and heap. In: ICIS, vol. 1, pp. 141–145. IEEE (2009)

Download references

Acknowledgments

This work was supported by National Science and Technology Supporting plan (2014BAK16B02, 2015BAH45F01), the cultural relic protection science and technology project of Zhejiang Province, NSFC 61502548 from NSF of China, grant MYRG2014-00106-FST and MYRG2016-00182-FST from UMAC RC.

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Shanshan Wu, Huaizhong Lin, Yunjun Gao & Dongming Lu
Faculty of Science and Technology, University of Macau, Macau, China
Leong Hou U

Authors

Shanshan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Huaizhong Lin
View author publications
You can also search for this author in PubMed Google Scholar
Leong Hou U
View author publications
You can also search for this author in PubMed Google Scholar
Yunjun Gao
View author publications
You can also search for this author in PubMed Google Scholar
Dongming Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huaizhong Lin.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, S., Lin, H., U, L.H. et al. Novel structures for counting frequent items in time decayed streams. World Wide Web 20, 1111–1133 (2017). https://doi.org/10.1007/s11280-017-0433-5

Download citation

Received: 04 October 2016
Revised: 10 December 2016
Accepted: 11 January 2017
Published: 24 January 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s11280-017-0433-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Novel structures for counting frequent items in time decayed streams

Abstract

Access this article

Similar content being viewed by others

Finding Frequent Items in Time Decayed Data Streams

Mining Long Patterns of Least-Support Items in Stream

How to Catch L 2-Heavy-Hitters on Sliding Windows

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Novel structures for counting frequent items in time decayed streams

Abstract

Access this article

Similar content being viewed by others

Finding Frequent Items in Time Decayed Data Streams

Mining Long Patterns of Least-Support Items in Stream

How to Catch L 2-Heavy-Hitters on Sliding Windows

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation