Using a Real-Time Top-k Algorithm to Mine the Most Frequent Items over Multiple Streams

Wang, Ling; Qu, Zhao Yang; Zhou, Tie Hua; Ryu, Keun Ho

doi:10.1007/978-3-642-39479-9_36

Ling Wang²⁰,
Zhao Yang Qu²⁰,
Tie Hua Zhou²¹ &
…
Keun Ho Ryu²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7995))

Included in the following conference series:

International Conference on Intelligent Computing

3444 Accesses

Abstract

Some applications such as sensor networks, internet traffic analysis, location-based services, and health measurements are always required for considering unbounded, fast, large-volumes, continuous, even for distributed stream data. It’s a better way to use synopsis as a list of partial summaries of unknown item sets in order to reduce the memory space usage, let it can afford to process so fast and huge incoming data. Normally, different quantity of item set leads to different summaries, especially for Top-k operator which as a partial preprocess over synopsis. Therefore, we proposed smooth synopsis that dynamically assigns a numeral interval to resolve the items set, in order to maintain a more accurate approximate answers’ list from partial Top-k processing. In particular, we proposed an algorithm (called SFI algorithm) to mine the most frequent items by a more adaptive and fast way in specific stream resources. Finally, our experimental results demonstrate the accuracy and efficiency of our approximation techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Arasu, A., Babcock, B., Babu, S., Datar, M., Ito, K., Nishizawa, I., Rosenstein, J., Widom, J.: STREAM: The Stanford Stream Data Manager (Demonstration Description). In: 2003 ACM SIGMOD International Conference on Management of Data, p. 665 (2003)
Google Scholar
Li, J., Tufte, K., Shkapenyuk, V., Papadimos, V., Johnson, T., Maier, D.: Out-of-order Processing A New Architecture for High-Performance Stream Systems. In: 34th International Conference on Very Large Data Bases, VLDB 2008, pp. 274–288 (2008)
Google Scholar
Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S.R., Reiss, F., Shah, M.A.: TelegraphCQ: Continuous Dataflow Processing for an Uncertain World. In: First Biennial Conference on Innovative Data Systems Research, CIDR 2003, Asilomar (2003)
Google Scholar
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models And Issues in Data Streams. In: 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database System, pp. 1–16 (2002)
Google Scholar
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., Dewitt, D.J., Madden, S., Stonebraker, M.: A Comparison of Approaches to Large-Scale Data Analysis. In: 2009 ACM SIGMOD International Conference on Management of Data, pp. 165–178 (2009)
Google Scholar
Mouratidis, K., Bakiras, S., Papadias, D.: Continuous Monitoring of Top-K Queries over Sliding Windows. In: 2006 ACM SIGMOD International Conference on Management of Data, pp. 635–646 (2006)
Google Scholar
Krishnamurthy, S., Wu, C., Franklin, M.J.: On-the-fly Sharing for Streamed Aggregation. In: 2006 ACM SIGMOD International Conference on Management of Data, pp. 623–634 (2006)
Google Scholar
Toman, D.: On Construction of Holistic Synopses under the Duplicate Semantics of Streaming Queries. In: TIME 2007 14th International Symposium on Temporal Representation and Reasoning, pp. 150–162 (2007)
Google Scholar
Stern, M., Buchmann, E., Bohm, K.: A Wavelet Transform for Efficient Consolidation of Sensor Relations with Quality Guarantees. In: VLDB 2009, 35th International Conference on Very Large Data Bases, pp. 157–168 (2009)
Google Scholar
Matias, Y., Urieli, D.: Optimal Workload-Based Weighted Wavelet Synopses. Journal of Theoretical Computer Science, 227–246 (2007)
Google Scholar
Golab, L., DeHaan, D., Demaine, E.D., Lopez-Ortiz, A., Munro, J.I.: Identifying Frequent Items In Sliding Windows Over On-Line Packet Streams. In: 3rd ACM SIGCOMM Conference on Internet Measurement, pp. 173–178 (2003)
Google Scholar
Wu, M., Equille, L.B., Marian, A., Procopiuc, C.M., Srivastava, D.: Processing Top-k join Queries. In: VLDB 2010, 36th International Conference on Very Large Data Bases, pp. 860–870 (2010)
Google Scholar
Cheng, J., Ke, Y.: Maintaining Frequent Itemsets over High-Speed Data Streams. In: 2006 PAKDD, 10th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 462e–467e (2006)
Google Scholar
Wong, W.K., Cheung, D.W., Hung, E., Kao, B., Mamoulis, N.: An Audit Environment for Outsourcing of Frequent Itemset Mining. In: VLDB 2009, 35th International Conference on Very Large Data Bases, pp. 1162–1172 (2009)
Google Scholar
Wang, L., Koo Lee, Y., Ryu, K.H.: Supporting Top-K Aggregate Queries over Unequal Synopsis on Internet Traffic Streams. In: Zhang, Y., Yu, G., Bertino, E., Xu, G. (eds.) APWeb 2008. LNCS, vol. 4976, pp. 590–600. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, School of Information Engineering, Northeast Dianli University, Jilin, China
Ling Wang & Zhao Yang Qu
Database/Bioinformatics Laboratory, School of Electrical & Computer Engineering, Chungbuk National University, Chungbuk, Korea
Tie Hua Zhou & Keun Ho Ryu

Authors

Ling Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhao Yang Qu
View author publications
You can also search for this author in PubMed Google Scholar
Tie Hua Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Keun Ho Ryu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Machine Learning and Systems Biology Laboratory, Tongji University, 4800 Caoan Road, 201804, Shanghai, China
De-Shuang Huang
Electrical and Electronics Department, Polytechnic of Bari, Via Orabona 4, 70125, Bari, Italy
Vitoantonio Bevilacqua
Faculty of Engineering, District University Francisco José de Caldas, Cra. 7a No. 40-53, Fifth Floor, Bogotá, Colombia
Juan Carlos Figueroa
School of Electrical, Computer and Telecommunications Engineering, The University of Wollongong, 2522, North Wollongong, NSW, Australia
Prashan Premaratne

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, L., Qu, Z.Y., Zhou, T.H., Ryu, K.H. (2013). Using a Real-Time Top-k Algorithm to Mine the Most Frequent Items over Multiple Streams. In: Huang, DS., Bevilacqua, V., Figueroa, J.C., Premaratne, P. (eds) Intelligent Computing Theories. ICIC 2013. Lecture Notes in Computer Science, vol 7995. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39479-9_36

Download citation

DOI: https://doi.org/10.1007/978-3-642-39479-9_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39478-2
Online ISBN: 978-3-642-39479-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics