Skip to main content
Log in

Efficient computation of deletion-robust k-coverage queries

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Extracting a controllable subset from a large-scale dataset so that users can fully understand the entire dataset is a significant topic for multicriteria decision making. In recent years, this problem has been widely studied, and various query models have been proposed, such as top-k, skyline, k-regret and k-coverage queries. Among these models, the k-coverage query is an ideal query method; this model has stability, scale invariance and high traversal efficiency. However, current methods including k-coverage queries cannot deal with deleting some points from the dataset while providing an effective solution set efficiently. In this paper, we study the robustness of k-coverage queries in two cases involving the dynamic deletion of data points. The first case is when it is assumed that the whole dataset can be obtained in advance, while the second is when the data points arrive in a stream. For a centralized dataset, we introduce a sieving mechanism and use a precalculated threshold to filter a coreset from the entire dataset. Then, the k-coverage query can be carried out on this small coreset instead of the entire dataset, and we propose a threshold-based k-coverage query algorithm, which greatly accelerates query processing. For a streaming dataset, a special chain structure is adopted. Furthermore, a single-pass streaming algorithm named Robust-Sieving is proposed. Moreover, the coreset-based method is extended to answer the problem. In addition, sampling techniques are adopted to accelerate query processing under these two circumstances. Extensive experiments verify the effectiveness of our proposed Robust-Sieving algorithm and the coreset-based algorithms with or without sampling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Availability of data and material

Not applicable.

Notes

  1. https://movielens.umn.edu/.

  2. https://finance.yahoo.com/quote/GE/history?ltr=1.

  3. http://www.kaggle.com/c/forest-cover-type-prediction/.

References

  1. Allenby R B, Slomson A (2010) How to count: an introduction to combinatorics, 2nd edn. Chapman & Hall/CRC, Boca Raton

    Book  Google Scholar 

  2. Badanidiyuru A, Mirzasoleiman B, Karbasi A, Krause A (2014) Streaming submodular maximization: Massive data summarization on the fly. In: Proceedings of the international conference on knowledge discovery and data mining (SIGKDD), pp 671–680

  3. Bai M, Xin J, Wang G, Zhang L, Zimmermann R, Yuan Y, Wu X (2016) Discovering the \(k\) representative skyline over a sliding window. IEEE Trans Knowl Data Eng (TKDE) 28(8):2041–2056

    Article  Google Scholar 

  4. Börzsöny S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 421–430

  5. Boykov YY, Jolly M (2001) Interactive graph cuts for optimal boundary & region segmentation of objects in n-d images. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 105–112

  6. Chan C-Y, Jagadish HV, Tan K-L, Tung AKH, Zhang Z (2006) Finding k-dominant skylines in high dimensional space. In: Proceedings of the international conference on management of data (SIGMOD), pp 503–514

  7. Chan C-Y, Jagadish HV, Tan K-L, Tung AKH, Zhang Z (2006) On high dimensional skylines. In: Proceedings of the international conference on extending database technology (EDBT), pp 478–495

  8. Chester S, Thomo A, Venkatesh S, Whitesides S (2014) Computing k-regret minimizing sets. In: Proceedings of the international conference on very large data bases (VLDB), pp 389–400

  9. Faulkner TK, Brackenbury W, Lall A (2015) k-regret queries with nonlinear utilities. In: Proceedings of the international conference on very large data bases (VLDB), pp 2098–2109

  10. Feldman M, Karbasi A, Kazemi E (2018) Do less, get more: streaming submodular maximization with subsampling. In: Proceedings of advances in neural information processing systems (NIPS), pp 732–742

  11. Gomes R, Krause A (2010) Budgeted nonparametric learning from data streams. In: Proceedings of the international conference on international conference on machine learning (ICML), pp 391–398

  12. Huang X, Zheng J (2019) Deletion-robust k-coverage queries. In: Proceedings of international conference on database systems for advanced applications (DASFAA), pp 215–219

  13. Huang X, Zheng J (2019) Streaming deletion-robust k-coverage queries. In: Proceedings of the international symposium on spatial and temporal databases (SSTD), pp 170–173

  14. Ilyas IF, Beskales G, Soliman MA (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv (CSUR) 40(4):1–58

    Article  Google Scholar 

  15. Kazemi E, Zadimoghaddam M, Karbasi A (2018) Scalable deletion-robust submodular maximization: data summarization with privacy and fairness constraints. In: Proceedings of the international conference on machine learning (ICML), pp 2549–2558

  16. Krause A, Singh A, Guestrin C (2008) Near-optimal sensor placements in gaussian processes: theory, efficient algorithms and empirical studies. J Mach Learn Res (JMLR) 9:235–284

    MATH  Google Scholar 

  17. Lee J, You G won, Hwang S won (2009) Personalized top-k skyline queries in high-dimensional space. Inf Syst 34(1):45–61

    Article  Google Scholar 

  18. Lian X, Chen L (2009) Top-k dominating queries in uncertain databases. In: Proceedings of the international conference on extending database technology (EDBT), pp 660–671

  19. Lin H, Bilmes J (2011) A class of submodular functions for document summarization. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies (HLT), pp 510–520

  20. Lin X, Yuan Y, Wei W, Lu H (2005) Stabbing the sky: efficient skyline computation over sliding windows. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 502–513

  21. Lin X, Yuan Y, Zhang Q, Zhang Y (2007) Selecting stars: the k most representative skyline operator. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 86–95

  22. Lu H, Jensen CS, Zhang Z (2011) Flexible and efficient resolution of skyline query size constraints. IEEE Trans Knowl Data Eng (TKDE) 23(7):991–1005

    Article  Google Scholar 

  23. Magnani M, Assent I, Mortensen ML (2014) Taking the big picture: representative skylines based on significance and diversity. VLDB J 23(5):795–815

    Article  Google Scholar 

  24. Mindolin D, Chomicki J (2009) Discovering relative importance of skyline attributes. In: Proceedings of the international conference on very large data bases (VLDB), pp 610–621

  25. Mirzasoleiman B, Badanidiyuru A, Karbasi A, Vondrak J, Krause A (2015) Lazier than lazy greedy. In: Proceedings of the AAAI conference on artificial intelligence (AAAI), pp 1812–1818

  26. Mirzasoleiman B, Karbasi A, Krause A (2017) Deletion-robust submodular maximization: Data summarization with “the right to be forgotten”. In: Proceedings of the international conference on machine learning (ICML), pp 2449–2458

  27. Nanongkai D, Lall A, Sarma AD, Makino K (2012) Interactive regret minimization. In: Proceedings of the international conference on management of data (SIGMOD), pp 109–120

  28. Nanongkai D, Sarma AD, Lall A, Lipton RJ, Xu J (2010) Regret-minimizing representative databases. In: Proceedings of the international conference on very large data bases (VLDB), pp 1114–1124

  29. Nemhauser GL, Wolsey LA, Fisher ML (1978) An analysis of approximations for maximizing submodular set functions–i. Math Program 14(1):265–294

    Article  MathSciNet  Google Scholar 

  30. Orlin JB, Schulz AS, Udwani R (2016) Robust monotone submodular function maximization. In: Louveaux Q, Skutella M (eds) Integer programming and combinatorial optimization, pp 312–324

  31. Papadias D, Tao Y, Fu G, Seeger B (2003) An optimal and progressive algorithm for skyline queries. In: Proceedings of the international conference on management of data (SIGMOD), pp 467–478

  32. Papadias D, Tao Y, Fu G, Seeger B (2005) Progressive skyline computation in database systems. TODS 30(1):41–82

    Article  Google Scholar 

  33. Peng P, Wong RC-W (2014) Geometry approach for k-regret query. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 772–783

  34. Qi J, Zuo F, Samet H, Yao JC (2018) K-regret queries using multiplicative utility functions. TODS 43(2):10:1–10:41

    Article  MathSciNet  Google Scholar 

  35. Søholm M, Chester S, Assent I (2016) Maximum coverage representative skyline. In: Proceedings of the international conference on extending database technology (EDBT), pp 702–703

  36. Soliman MA, Ilyas IF, Chang KC (2007) Top-k query processing in uncertain databases. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 896–905

  37. Tao Y, Ding L, Lin X, Pei J (2009) Distance-based representative skyline. In: Proceedings of the IEEE international conference on data engineering (ICDE), pp 892–903

  38. Wang S, Cheema MA, Zhang Y, Lin X (2015) Selecting representative objects considering coverage and diversity. In: Proceedings of the international ACM workshop on managing and mining enriched geo-spatial data (GeoRich), pp 31–38

  39. Xie M, Wong RC-W, Lall A (2019) Strongly truthful interactive regret minimization. In: Proceedings of the international conference on management of data (SIGMOD), pp 281–298

  40. Xie M, Wong RC-W, Lall A (2020) An experimental survey of regret minimization query and variants: bridging the best worlds between top-k query and skyline query. VLDB J 29:147–175

    Article  Google Scholar 

  41. Xie M, Wong RC-W, Li J, Long C, Lall A (2018) Efficient k-regret query algorithm with restriction-free bound for any dimensionality. In: Proceedings of the international conference on management of data (SIGMOD), pp 959–974

  42. Yufei T, Dimitris P (2006) Maintaining sliding window skylines on data streams. IEEE Trans Knowl Data Eng (TKDE) 18(3):377–391

    Article  Google Scholar 

  43. Zeighami S, Wong RC-W (2016) Minimizing average regret ratio in database. In: Proceedings of the international conference on management of data (SIGMOD), pp 2265–2266

Download references

Acknowledgements

This research is partially supported by the National Natural Science Foundation of China under Grant Nos. U1733112 and 61702260 and the Fundamental Research Funds for the Central Universities under Grant No. NS2020068. This paper is an extended version of our previous conference papers [12, 13].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiping Zheng.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, J., Huang, X. & Ma, Y. Efficient computation of deletion-robust k-coverage queries. Knowl Inf Syst 63, 759–789 (2021). https://doi.org/10.1007/s10115-020-01540-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-020-01540-6

Keywords

Navigation