Continuous top-k query over sliding window is a fundamental problem in database, which retrieves k objects with the highest scores when the window slides. Existing studies mainly adopt exact algorithms to tackle this type of queries, whose key idea is to maintain a subset of objects in the window, and try to retrieve answers from it. However, all the existing algorithms are sensitive to query parameters and data distribution. In addition, they suffer from expensive overhead for incremental maintenance, and thus cannot satisfy real-time requirement. In this paper, we define a novel query named (ε, δ)-approximate continuous top-k query, which returns approximate answers for top-k query. In order to efficiently support this query, we propose an efficient framework, named PABF (Probabilistic Approximate Based Framework), to support approximate top-k query over sliding window. We firstly maintain a self-adaptive pruning value, which could filter out newly arrived objects who have a probability less than 1 − δ of being a query result. For those objects that are not filtered, we combine them together, if the score difference among them is less than a threshold. To efficiently maintain these combined results, the framework PABF also proposes a multi-phase merging algorithm. Theoretical analysis indicates that even in the worst case, we require only logarithmic complexity for maintaining each candidate.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Yang D, Shastri A, Rundensteiner E A, Ward M O. An optimal strategy for monitoring top-k queries in streaming windows. In Proc. the 14th International Conference on Extending Database Technology, March 2011, pp.57-68.
Mouratidis K, Bakiras S, Papadias D. Continuous monitoring of top-k queries over sliding windows. In Proc. ACM SIGMOD International Conference on Management of Data, June 2006, pp.635-646.
Bai M, Xin J C, Wang G R, Zhang L M, Zimmermann R, Yuan Y, Wu X D. Discovering the k representative skyline over a sliding window. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(8): 2041-2056.
Yu A, Agarwal P K, Yang J. Processing a large number of continuous preference top-k queries. In Proc. ACM SIGMOD International Conference on Management of Data, June 2012, pp.397-408.
Shen Z T, Cheema M A, Lin X M, Zhang W J, Wang H X. Efficiently monitoring top-k pairs over sliding windows. In Proc. the 28th International Conference on Data Engineering, April 2012, pp.798-809.
Yang X C, Qiu T, Wang B, Zheng B H, Wang Y S, Li C. Negative factor: Improving regular-expression matching in strings. ACM Transactions on Database Systems, 2016, 40(4): 25.
Yang X C, Liu H L, Wang B. ALAE: Accelerating local alignment with affine gap exactly in biosequence databases. Proceedings of the VLDB Endowment, 2012, 5(11): 1507-1518.
Yang X C, Wang B, Qiu T, Wang Y S, Li C. Improving regular-expression matching on strings using negative factors. In Proc. ACM SIGMOD International Conference on Management of Data, June 2013, pp.361-372.
Xie X H, Yang X C, Wang J Y, Wang B, Li C. Efficient direct search on compressed genomic data. In Proc. the 29th International Conference on Data Engineering, April 2013, pp.961-972.
Yi K, Yu H, Yang J, Xia G Q, Chen Y G. Efficient maintenance of materialized top-k views. In Proc. the 19th International Conference on Data Engineering, March 2003, pp.189-200.
Pripužić K, Žarko I P, Aberer K. Time- and space-efficient sliding window top-k query processing. ACM Transactions on Database Systems, 2015, 40(1): Article No. 1.
Alon N, Matias Y, Szegedy M. The space complexity of approximating the frequency moments. In Proc. the 28th Annual ACM Symposium on the Theory of Computing, May 1996, pp.20-29.
Datar M, Gionis A, Indyk P, Motwani R. Maintaining stream statistics over sliding windows. In Proc. the 13th Annual ACM SIAM Symposium on Discrete Algorithms, January 2002, pp.635-644.
Harvey N J A, Nelson J, Onak K. Sketching and streaming entropy via approximation theory. In Proc. the 49th Annual IEEE Symposium on Foundations of Computer Science, Oct. 2008, pp.489-498.
Tong Y X, Zhang X F, Chen L. Tracking frequent items over distributed probabilistic data. World Wide Web, 2016, 19(4): 579-604.
Charikar M, Chen K, Farach-Colton M. Finding frequent items in data streams. In Proc. the 29th International Conference on Automata, Languages and Programming, July 2002, pp.693-703.
Ganguly S, Majumder A. Cr-precis: A deterministic summary structure for update data streams. In Proc. the 1st Int. Symp. Combinatorics, Algorithms, Probabilistic and Experimental Methodologies, April 2007, pp.48-59.
Shrivastava N, Buragohain C, Agrawal D, Suri S. Medians and beyond: New aggregation techniques for sensor networks. In Proc. the 2nd International Conference on Embedded Networked Sensor Systems, November 2004, pp.239-249.
Cormode G, Muthukrishnan S. An improved data stream summary: The count-min sketch and its applications. Journal of Algorithms, 2005, 55(1): 58-75.
DeGroot M H, Schervish M J. Probability and Statistics (4th edition). China Machine Press, 2012.
This work is partially supported by the National Natural Science Fund for Distinguish Young Scholars of China under Grant No. 61322208, the National Basic Research 973 Program of China under Grant No. 2012CB316201, the National Natural Science Foundation of China under Grant Nos. 61272178 and 61572122, and the Key Program of the National Natural Science Foundation of China under Grant No. 61532021.
About this article
Cite this article
Zhu, R., Wang, B., Luo, S. et al. Approximate Continuous Top-k Query over Sliding Window. J. Comput. Sci. Technol. 32, 93–109 (2017). https://doi.org/10.1007/s11390-017-1708-0
- continuous top-k query
- sliding window