Abstract
Extracting a controllable subset from a large-scale dataset so that users can fully understand the entire dataset is a significant topic for multicriteria decision making. In recent years, this problem has been widely studied, and various query models have been proposed, such as top-k, skyline, k-regret and k-coverage queries. Among these models, the k-coverage query is an ideal query method; this model has stability, scale invariance and high traversal efficiency. However, current methods including k-coverage queries cannot deal with deleting some points from the dataset while providing an effective solution set efficiently. In this paper, we study the robustness of k-coverage queries in two cases involving the dynamic deletion of data points. The first case is when it is assumed that the whole dataset can be obtained in advance, while the second is when the data points arrive in a stream. For a centralized dataset, we introduce a sieving mechanism and use a precalculated threshold to filter a coreset from the entire dataset. Then, the k-coverage query can be carried out on this small coreset instead of the entire dataset, and we propose a threshold-based k-coverage query algorithm, which greatly accelerates query processing. For a streaming dataset, a special chain structure is adopted. Furthermore, a single-pass streaming algorithm named Robust-Sieving is proposed. Moreover, the coreset-based method is extended to answer the problem. In addition, sampling techniques are adopted to accelerate query processing under these two circumstances. Extensive experiments verify the effectiveness of our proposed Robust-Sieving algorithm and the coreset-based algorithms with or without sampling.
Similar content being viewed by others
Availability of data and material
Not applicable.
References
Allenby R B, Slomson A (2010) How to count: an introduction to combinatorics, 2nd edn. Chapman & Hall/CRC, Boca Raton
Badanidiyuru A, Mirzasoleiman B, Karbasi A, Krause A (2014) Streaming submodular maximization: Massive data summarization on the fly. In: Proceedings of the international conference on knowledge discovery and data mining (SIGKDD), pp 671–680
Bai M, Xin J, Wang G, Zhang L, Zimmermann R, Yuan Y, Wu X (2016) Discovering the \(k\) representative skyline over a sliding window. IEEE Trans Knowl Data Eng (TKDE) 28(8):2041–2056
Börzsöny S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 421–430
Boykov YY, Jolly M (2001) Interactive graph cuts for optimal boundary & region segmentation of objects in n-d images. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 105–112
Chan C-Y, Jagadish HV, Tan K-L, Tung AKH, Zhang Z (2006) Finding k-dominant skylines in high dimensional space. In: Proceedings of the international conference on management of data (SIGMOD), pp 503–514
Chan C-Y, Jagadish HV, Tan K-L, Tung AKH, Zhang Z (2006) On high dimensional skylines. In: Proceedings of the international conference on extending database technology (EDBT), pp 478–495
Chester S, Thomo A, Venkatesh S, Whitesides S (2014) Computing k-regret minimizing sets. In: Proceedings of the international conference on very large data bases (VLDB), pp 389–400
Faulkner TK, Brackenbury W, Lall A (2015) k-regret queries with nonlinear utilities. In: Proceedings of the international conference on very large data bases (VLDB), pp 2098–2109
Feldman M, Karbasi A, Kazemi E (2018) Do less, get more: streaming submodular maximization with subsampling. In: Proceedings of advances in neural information processing systems (NIPS), pp 732–742
Gomes R, Krause A (2010) Budgeted nonparametric learning from data streams. In: Proceedings of the international conference on international conference on machine learning (ICML), pp 391–398
Huang X, Zheng J (2019) Deletion-robust k-coverage queries. In: Proceedings of international conference on database systems for advanced applications (DASFAA), pp 215–219
Huang X, Zheng J (2019) Streaming deletion-robust k-coverage queries. In: Proceedings of the international symposium on spatial and temporal databases (SSTD), pp 170–173
Ilyas IF, Beskales G, Soliman MA (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv (CSUR) 40(4):1–58
Kazemi E, Zadimoghaddam M, Karbasi A (2018) Scalable deletion-robust submodular maximization: data summarization with privacy and fairness constraints. In: Proceedings of the international conference on machine learning (ICML), pp 2549–2558
Krause A, Singh A, Guestrin C (2008) Near-optimal sensor placements in gaussian processes: theory, efficient algorithms and empirical studies. J Mach Learn Res (JMLR) 9:235–284
Lee J, You G won, Hwang S won (2009) Personalized top-k skyline queries in high-dimensional space. Inf Syst 34(1):45–61
Lian X, Chen L (2009) Top-k dominating queries in uncertain databases. In: Proceedings of the international conference on extending database technology (EDBT), pp 660–671
Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies (HLT), pp 510–520
Lin X, Yuan Y, Wei W, Lu H (2005) Stabbing the sky: efficient skyline computation over sliding windows. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 502–513
Lin X, Yuan Y, Zhang Q, Zhang Y (2007) Selecting stars: the k most representative skyline operator. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 86–95
Lu H, Jensen CS, Zhang Z (2011) Flexible and efficient resolution of skyline query size constraints. IEEE Trans Knowl Data Eng (TKDE) 23(7):991–1005
Magnani M, Assent I, Mortensen ML (2014) Taking the big picture: representative skylines based on significance and diversity. VLDB J 23(5):795–815
Mindolin D, Chomicki J (2009) Discovering relative importance of skyline attributes. In: Proceedings of the international conference on very large data bases (VLDB), pp 610–621
Mirzasoleiman B, Badanidiyuru A, Karbasi A, Vondrak J, Krause A (2015) Lazier than lazy greedy. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 1812–1818
Mirzasoleiman B, Karbasi A, Krause A (2017) Deletion-robust submodular maximization: Data summarization with “the right to be forgotten”. In: Proceedings of the international conference on machine learning (ICML), pp 2449–2458
Nanongkai D, Lall A, Sarma AD, Makino K (2012) Interactive regret minimization. In: Proceedings of the international conference on management of data (SIGMOD), pp 109–120
Nanongkai D, Sarma AD, Lall A, Lipton RJ, Xu J (2010) Regret-minimizing representative databases. In: Proceedings of the international conference on very large data bases (VLDB), pp 1114–1124
Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions–i. Math Program 14(1):265–294
Orlin JB, Schulz AS, Udwani R (2016) Robust monotone submodular function maximization. In: Louveaux Q, Skutella M (eds) Integer programming and combinatorial optimization, pp 312–324
Papadias D, Tao Y, Fu G, Seeger B (2003) An optimal and progressive algorithm for skyline queries. In: Proceedings of the international conference on management of data (SIGMOD), pp 467–478
Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. TODS 30(1):41–82
Peng P, Wong RC-W (2014) Geometry approach for k-regret query. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 772–783
Qi J, Zuo F, Samet H, Yao JC (2018) K-regret queries using multiplicative utility functions. TODS 43(2):10:1–10:41
Søholm M, Chester S, Assent I (2016) Maximum coverage representative skyline. In: Proceedings of the international conference on extending database technology (EDBT), pp 702–703
Soliman MA, Ilyas IF, Chang KC (2007) Top-k query processing in uncertain databases. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 896–905
Tao Y, Ding L, Lin X, Pei J (2009) Distance-based representative skyline. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 892–903
Wang S, Cheema MA, Zhang Y, Lin X (2015) Selecting representative objects considering coverage and diversity. In: Proceedings of the international ACM workshop on managing and mining enriched geo-spatial data (GeoRich), pp 31–38
Xie M, Wong RC-W, Lall A (2019) Strongly truthful interactive regret minimization. In: Proceedings of the international conference on management of data (SIGMOD), pp 281–298
Xie M, Wong RC-W, Lall A (2020) An experimental survey of regret minimization query and variants: bridging the best worlds between top-k query and skyline query. VLDB J 29:147–175
Xie M, Wong RC-W, Li J, Long C, Lall A (2018) Efficient k-regret query algorithm with restriction-free bound for any dimensionality. In: Proceedings of the international conference on management of data (SIGMOD), pp 959–974
Yufei T, Dimitris P (2006) Maintaining sliding window skylines on data streams. IEEE Trans Knowl Data Eng (TKDE) 18(3):377–391
Zeighami S, Wong RC-W (2016) Minimizing average regret ratio in database. In: Proceedings of the international conference on management of data (SIGMOD), pp 2265–2266
Acknowledgements
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zheng, J., Huang, X. & Ma, Y. Efficient computation of deletion-robust k-coverage queries. Knowl Inf Syst 63, 759–789 (2021). https://doi.org/10.1007/s10115-020-01540-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-020-01540-6