EPF: A General Framework for Supporting Continuous Top-k Queries Over Streaming Data
- 28 Downloads
Continuous top-k query over sliding window is a fundamental problem in the domain of streaming data management, which monitors the query window and retrieves k objects with the highest scores when the window slides. The key of supporting this query is maintaining a subset of objects in the window, and try to retrieve answers from them when the window slides. The state-of-the-art approach called SAP utilizes the partition technique to support top-k searches. Its key idea is using, as few as possible, high-quality candidates to support the query via finding a proper partition. However, it has to waste relatively high computation cost in evaluating whether the partition is proper and re-scanning the widow. In this paper, we propose an ELM-based framework named EPF, which improves SAP via learning the nature of streaming data. If we learn that the distribution of streaming data is predictable, we could construct a suitable prediction model for a more efficient partition of the window. Furthermore, we propose a novel algorithm to reduce the re-scanning cost. We conduct a thorough experimental study of this technique on real and synthetic datasets and show the significant performance improvement when applying the technique in existing algorithms.
KeywordsELM stream classification top-k
This work is partially supported by the NSF of China under grant Nos. 61702344, 61272178, 61502317, U1401256, and the NSF of China for Key Program under grant No. 61532021.
Compliance with Ethical Standards
Conflict of interests
The authors declare that they have no potential con ict of interest. This article does not contain any studies involving human participants and/or animals by any of the authors. Informed consent was obtained from all individual participants.
- 8.Shen Z, Cheema MA, Lin X, Zhang W, Wang H. 2012. Efficiently monitoring top-k pairs over sliding windows. In: ICDE, pp 798–809.Google Scholar
- 10.Tong Y, She J, Ding B, Chen L, Wo T, Xu K. Online minimum matching in real-time spatial data E77xperiments and analysis. PVLDB 2016;9(12):1053–1064.Google Scholar
- 11.Tong Y, She J, Ding B, Wang L, Chen L. 2016. Online mobile micro-task allocation in spatial crowdsourcing. In: 32nd IEEE international conference on data engineering, ICDE 2016, Helsinki, Finland, May 16-20, 2016, pp 49–60.Google Scholar
- 12.Tarutani Y, Hashimoto K, Hasegawa G, Nakamura Y, Tamura T, Matsuda K, Matsuoka M. 2015. Temperature distribution prediction in data centers for decreasing power consumption by machine learning. In: 7th IEEE International Conference on Cloud Computing Technology and Science, CloudCom 2015, Vancouver, BC, Canada November 30 - December 3, 2015, pp 635–642.Google Scholar
- 13.Foo YW, Goh C, Li Y. 2016. Machine learning with sensitivity analysis to determine key factors contributing to energy consumption in cloud data centers. In: International conference on cloud computing research and innovations, ICCCRI 2016, Singapore, Singapore, May 4-5, 2016, pp 107–113.Google Scholar
- 14.Blanchart P, Ferecatu M, Datcu M. 2011. Active learning using the data distribution for interactive image classification and retrieval. In: Proceedings of the IEEE symposium on computational intelligence and data mining, CIDM 2011, part of the IEEE symposium series on computational intelligence 2011, April 11-15, 2011, Paris, France pp 7–14.Google Scholar
- 16.Huang G-B, Zhu Q-Y, Siew C-K. 2004. Extreme learning machine: a new learning scheme of feedforward neural networks. In: International symposium on neural networks, vol 2.Google Scholar
- 20.Caruana G, Li M, Qi M. 2011. A MapReduce based parallel SVM for large scale spam filtering. In: Fuzzy systems and knowledge discovery.Google Scholar
- 22.Mouratidis K, Bakiras S, Papadias D. 2006. Continuous monitoring of top-k queries over sliding windows. In: SIGMOD conference, pp 635–646.Google Scholar
- 23.Yang D, Shastri A, Rundensteiner EA, Ward MO. 2011. An optimal strategy for monitoring top-k queries in streaming windows. In: EDBT, pp 57–68.Google Scholar
- 30.Weisstein EW. de moivre-laplace theorem. From MathWorld - A Wolfram Web Resource. http://mathworld.wolfram.com/deMoivre-LaplaceTheorem.html.
- 31.Cortes C, Vapnik V. Support vector networks. Mach Learn 1995;20:273–297.Google Scholar
- 32.Fan Y, Qian Y, Soong FK, He L. 2015. Multi-speaker modeling and speaker adaptation for dnn-based TTS synthesis. In: 2015 IEEE international conference on acoustics, speech and signal processing, ICASSP 2015, South Brisbane, Queensland, Australia, April 19-24, 2015, pp 4475–4479.Google Scholar
- 33.Jourabloo A, Liu X. 2016. Large-pose face alignment via cnn-based dense 3d model fitting. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp 4188–4196.Google Scholar
- 34.Clark S, Dyer C, Blunsom P, Yogatama D, Kuncoro A, Hale J. 2018. Lstms can learn syntax-sensitive dependencies well, but modeling structure makes them better. In: Proceedings of the 56th annual meeting of the association for computational linguistics, ACL 2018, Melbourne, Australia, July 15-20, 2018, Vol 1: Long Papers, pp 1426–1436.Google Scholar