Package queries: efficient and scalable computation of high-order constraints

Brucato, Matteo; Abouzied, Azza; Meliou, Alexandra

doi:10.1007/s00778-017-0483-4

Package queries: efficient and scalable computation of high-order constraints

Special Issue Paper
Published: 24 October 2017

Volume 27, pages 693–718, (2018)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

493 Accesses
7 Citations
Explore all metrics

Abstract

Traditional database queries follow a simple model: they define constraints that each tuple in the result must satisfy. This model is computationally efficient, as the database system can evaluate the query conditions on each tuple individually. However, many practical, real-world problems require a collection of result tuples to satisfy constraints collectively, rather than individually. In this paper, we present package queries, a new query model that extends traditional database queries to handle complex constraints and preferences over answer sets. We develop a full-fledged package query system, implemented on top of a traditional database engine. Our work makes several contributions. (1) We design PaQL, a SQL-based query language that supports the declarative specification of package queries. We prove that PaQL is at least as expressive as integer linear programming, and therefore, evaluation of package queries is NP-hard. (2) We present a fundamental evaluation strategy that combines the capabilities of databases and constraint optimization solvers to derive solutions to package queries. The core of our approach is a set of translation rules that transform a package query to an integer linear program. (3) We introduce an offline data partitioning strategy allowing query evaluation to scale to large data sizes. (4) We introduce SketchRefine, a scalable algorithm for package evaluation, with strong approximation guarantees [\((1 \pm \varepsilon )\)-factor approximation]. (5) We present a method for parallelizing the Refine phase of SketchRefine. (6) We present an empirical study of the efficiency gains of providing integer solvers with starting solutions. (7) We present extensive experiments over real-world and benchmark data. The results demonstrate that our methods are effective at deriving high-quality package results and achieve runtime performance that is an order of magnitude faster than directly using ILP solvers over large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review and comparison of solvers for convex MINLP

Article Open access 03 December 2018

CG-FHAUI: an efficient algorithm for simultaneously mining succinct pattern sets of frequent high average utility itemsets

Article 07 May 2024

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Article Open access 15 January 2021

Notes

The evaluation of non-recursive SQL queries is polynomial with respect to data complexity. When we discuss complexity in this paper, we refer to data complexity.
This syntax slightly differs from the one presented in [6].
For ease of presentation, we show an ILP with nonnegative variables, but the mapping generalizes to arbitrary integer variables: negative variables negate the corresponding values in the query; for arbitrary bounds on each variable, add cardinality constraints to individual tuples.
http://cas.sdss.org/dr12/en/help/docs/realquery.aspx.

References

Basu Roy, S., Amer-Yahia, S., Chawla, A., Das, G., Yu, C.: Constructing and exploring composite items. In: SIGMOD, pp 843–854 (2010)
Baykasoglu, A., Dereli, T., Das, S.: Project team selection using fuzzy optimization approach. Cybern. Syst. 38(2), 155–185 (2007)
Article Google Scholar
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Article Google Scholar
Bisschop, J.: AIMMS Optimization Modeling. Paragon Decision Technology, Haarlem (2006)
Google Scholar
Bonatti, P., Calimeri, F., Leone, N., Ricca, F.: A 25-year perspective on logic programming. Chapter Answer Set Programming, pp. 159–182. Springer, Berlin (2010)
Brucato, M., Beltran, J.F., Abouzied, A., Meliou, A.: Scalable package queries in relational database systems. PVLDB 9(7), 576–587 (2016)
Google Scholar
Brucato, M., Ramakrishna, R., Abouzied, A., Meliou, A.: PackageBuilder: from tuples to packages. PVLDB 7(13), 1593–1596 (2014)
Google Scholar
Cook, W., Hartmann, M.: On the complexity of branch and cut methods for the traveling salesman problem. Polyhedral Comb. 1, 75–82 (1990)
MathSciNet MATH Google Scholar
De Choudhury, M., Feldman, M., Amer-Yahia, S., Golbandi, N., Lempel, R., Yu, C.: Automatic construction of travel itineraries using social breadcrumbs. In: HyperText, pp. 35–44 (2010)
Deng, T., Fan, W., Geerts, F.: On the complexity of package recommendation problems. In: PODS, pp. 261–272 (2012)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)
Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., Ullman, J.D.: Computing iceberg queries efficiently. In: VLDB’98, Proceedings of 24th International Conference on Very Large Data Bases, Aug 24–27, 1998, New York City, New York, USA, pp. 299–310 (1998)
Finkel, R.A., Bentley, J.L.: Quad trees a data structure for retrieval on composite keys. Acta Inform. 4(1), 1–9 (1974)
Article Google Scholar
Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T.: Clingo = ASP + control: Preliminary report. In: Leuschel, M., Schrijvers, T., (eds.) Technical Communications of the Thirtieth International Conference on Logic Programming (ICLP’14), volume arXiv:1405.3694v1 (2014). Theory and Practice of Logic Programming, Online Supplement
GNU Bison. https://www.gnu.org/software/bison/
Goemans, M.X., Williamson, D.P.: The primal-dual method for approximation algorithms and its application to network design problems. In: Approximation Algorithms for NP-Hard Problems, pp. 144–191 (1997)
Guha, S., Gunopulos, D., Koudas, N., Srivastava, D., Vlachos, M.: Efficient approximation of optimization queries under parametric aggregation constraints. In: VLDB, pp. 778–789 (2003)
Chapter Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. Appl. Stat. 28, 100–108 (1979)
Article Google Scholar
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Article MathSciNet Google Scholar
IBM CPLEX Optimization Studio. http://www.ibm.com/software/commerce/optimization/cplex-optimizer/
Kalinin, A., Cetintemel, U., Zdonik, S.: Interactive data exploration using semantic windows. In: SIGMOD, pp. 505–516 (2014)
Kalinin, A., Çetintemel, U., Zdonik, S.B.: Searchlight: enabling integrated search and exploration over large multidimensional data. PVLDB 8(10), 1094–1105 (2015)
Google Scholar
Kanellakis, P.C., Kuper, G.M., Revesz, P.Z.: Constraint query languages. J. Comput. Syst. Sci. 1(51), 26–52 (1995)
Article MathSciNet Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis, vol. 344. Wiley, Hoboken (2009)
Laporte, M., Novelli, N., Cicchetti, R., Lakhal, L.: Computing full and iceberg datacubes using partitions. In: Foundations of Intelligent Systems, 13th International Symposium, ISMIS 2002, Lyon, France, June 27–29, 2002, Proceedings, pp. 244–254 (2002)
Lappas, T., Liu, K., Terzi, E.: Finding a team of experts in social networks. In: SIGKDD, pp. 467–476 (2009)
Meliou, A., Suciu, D.: Tiresias: the database oracle for how-to queries. In: SIGMOD, pp. 337–348 (2012)
Mirzasoleiman, B., Karbasi, A., Sarkar, R., Krause, A.: Identifying representative elements in massive data. In: NIPS, Distributed Submodular maximization (2013)
Ng, R.T., Wagner, A.S., Yin, Y.: Iceberg-cube computation with PC clusters. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, Santa Barbara, CA, USA, May 21–24, 2001, pp. 25–36 (2001)
Padberg, M., Rinaldi, G.: A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Rev. 33(1), 60–100 (1991)
Article MathSciNet Google Scholar
Parameswaran, A.G., Venetis, P., Garcia-Molina, H.: Recommendation systems with complex constraints: a course recommendation perspective. ACM TOIS 29(4), 1–33 (2011)
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Pinel, F., Varshney, L.R.: Computational creativity for culinary recipes. In: CHI, pp. 439–442 (2014)
The Sloan Digital Sky Survey. http://www.sdss.org/
The TPC-H Benchmark. http://www.tpc.org/tpch/
Williamson, D.P., Shmoys, D.B.: The Design of Approximation Algorithms. Cambridge University Press, Cambridge (2011)
Book Google Scholar
Xie, M., Lakshmanan, L.V.S., Wood, P.T.: Breaking out of the box of recommendations: from items to packages. In: Proceedings of the 2010 ACM Conference on Recommender Systems, RecSys 2010, Barcelona, Spain, Sep 26–30, 2010, pp. 151–158 (2010)
Xie, M., Lakshmanan, L.V.S., Wood, P.T.: Composite recommendations: from items to packages. Front. Comput. Sci. 6(3), 264–277 (2012)
MathSciNet MATH Google Scholar
Xie, M., Lakshmanan, L.V.S., Wood, P.T.: Generating top-k packages via preference elicitation. PVLDB 7(14), 1941–1952 (2014)
Google Scholar

Download references

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grants IIS-1420941, IIS-1421322, and IIS-1453543.

Author information

Authors and Affiliations

College of Information and Computer Sciences, University of Massachusetts, Amherst, MA, USA
Matteo Brucato & Alexandra Meliou
Computer Science, New York University, Abu Dhabi, UAE
Azza Abouzied

Authors

Matteo Brucato
View author publications
You can also search for this author in PubMed Google Scholar
Azza Abouzied
View author publications
You can also search for this author in PubMed Google Scholar
Alexandra Meliou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matteo Brucato.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 194 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brucato, M., Abouzied, A. & Meliou, A. Package queries: efficient and scalable computation of high-order constraints. The VLDB Journal 27, 693–718 (2018). https://doi.org/10.1007/s00778-017-0483-4

Download citation

Received: 23 January 2017
Revised: 25 August 2017
Accepted: 22 September 2017
Published: 24 October 2017
Issue Date: October 2018
DOI: https://doi.org/10.1007/s00778-017-0483-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Package queries: efficient and scalable computation of high-order constraints

Abstract

Access this article

Similar content being viewed by others

A review and comparison of solvers for convex MINLP

CG-FHAUI: an efficient algorithm for simultaneously mining succinct pattern sets of frequent high average utility itemsets

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 194 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Package queries: efficient and scalable computation of high-order constraints

Abstract

Access this article

Similar content being viewed by others

A review and comparison of solvers for convex MINLP

CG-FHAUI: an efficient algorithm for simultaneously mining succinct pattern sets of frequent high average utility itemsets

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 194 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation