Skip to main content
Log in

Package queries: efficient and scalable computation of high-order constraints

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Traditional database queries follow a simple model: they define constraints that each tuple in the result must satisfy. This model is computationally efficient, as the database system can evaluate the query conditions on each tuple individually. However, many practical, real-world problems require a collection of result tuples to satisfy constraints collectively, rather than individually. In this paper, we present package queries, a new query model that extends traditional database queries to handle complex constraints and preferences over answer sets. We develop a full-fledged package query system, implemented on top of a traditional database engine. Our work makes several contributions. (1) We design PaQL, a SQL-based query language that supports the declarative specification of package queries. We prove that PaQL is at least as expressive as integer linear programming, and therefore, evaluation of package queries is NP-hard. (2) We present a fundamental evaluation strategy that combines the capabilities of databases and constraint optimization solvers to derive solutions to package queries. The core of our approach is a set of translation rules that transform a package query to an integer linear program. (3) We introduce an offline data partitioning strategy allowing query evaluation to scale to large data sizes. (4) We introduce SketchRefine, a scalable algorithm for package evaluation, with strong approximation guarantees [\((1 \pm \varepsilon )\)-factor approximation]. (5) We present a method for parallelizing the Refine phase of SketchRefine. (6) We present an empirical study of the efficiency gains of providing integer solvers with starting solutions. (7) We present extensive experiments over real-world and benchmark data. The results demonstrate that our methods are effective at deriving high-quality package results and achieve runtime performance that is an order of magnitude faster than directly using ILP solvers over large datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. The evaluation of non-recursive SQL queries is polynomial with respect to data complexity. When we discuss complexity in this paper, we refer to data complexity.

  2. This syntax slightly differs from the one presented in [6].

  3. For ease of presentation, we show an ILP with nonnegative variables, but the mapping generalizes to arbitrary integer variables: negative variables negate the corresponding values in the query; for arbitrary bounds on each variable, add cardinality constraints to individual tuples.

  4. http://cas.sdss.org/dr12/en/help/docs/realquery.aspx.

References

  1. Basu Roy, S., Amer-Yahia, S., Chawla, A., Das, G., Yu, C.: Constructing and exploring composite items. In: SIGMOD, pp 843–854 (2010)

  2. Baykasoglu, A., Dereli, T., Das, S.: Project team selection using fuzzy optimization approach. Cybern. Syst. 38(2), 155–185 (2007)

    Article  Google Scholar 

  3. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  Google Scholar 

  4. Bisschop, J.: AIMMS Optimization Modeling. Paragon Decision Technology, Haarlem (2006)

    Google Scholar 

  5. Bonatti, P., Calimeri, F., Leone, N., Ricca, F.: A 25-year perspective on logic programming. Chapter Answer Set Programming, pp. 159–182. Springer, Berlin (2010)

  6. Brucato, M., Beltran, J.F., Abouzied, A., Meliou, A.: Scalable package queries in relational database systems. PVLDB 9(7), 576–587 (2016)

    Google Scholar 

  7. Brucato, M., Ramakrishna, R., Abouzied, A., Meliou, A.: PackageBuilder: from tuples to packages. PVLDB 7(13), 1593–1596 (2014)

    Google Scholar 

  8. Cook, W., Hartmann, M.: On the complexity of branch and cut methods for the traveling salesman problem. Polyhedral Comb. 1, 75–82 (1990)

    MathSciNet  MATH  Google Scholar 

  9. De Choudhury, M., Feldman, M., Amer-Yahia, S., Golbandi, N., Lempel, R., Yu, C.: Automatic construction of travel itineraries using social breadcrumbs. In: HyperText, pp. 35–44 (2010)

  10. Deng, T., Fan, W., Geerts, F.: On the complexity of package recommendation problems. In: PODS, pp. 261–272 (2012)

  11. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)

  12. Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., Ullman, J.D.: Computing iceberg queries efficiently. In: VLDB’98, Proceedings of 24th International Conference on Very Large Data Bases, Aug 24–27, 1998, New York City, New York, USA, pp. 299–310 (1998)

  13. Finkel, R.A., Bentley, J.L.: Quad trees a data structure for retrieval on composite keys. Acta Inform. 4(1), 1–9 (1974)

    Article  Google Scholar 

  14. Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T.: Clingo = ASP + control: Preliminary report. In: Leuschel, M., Schrijvers, T., (eds.) Technical Communications of the Thirtieth International Conference on Logic Programming (ICLP’14), volume arXiv:1405.3694v1 (2014). Theory and Practice of Logic Programming, Online Supplement

  15. GNU Bison. https://www.gnu.org/software/bison/

  16. Goemans, M.X., Williamson, D.P.: The primal-dual method for approximation algorithms and its application to network design problems. In: Approximation Algorithms for NP-Hard Problems, pp. 144–191 (1997)

  17. Guha, S., Gunopulos, D., Koudas, N., Srivastava, D., Vlachos, M.: Efficient approximation of optimization queries under parametric aggregation constraints. In: VLDB, pp. 778–789 (2003)

    Chapter  Google Scholar 

  18. Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. Appl. Stat. 28, 100–108 (1979)

    Article  Google Scholar 

  19. Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)

    Article  MathSciNet  Google Scholar 

  20. IBM CPLEX Optimization Studio. http://www.ibm.com/software/commerce/optimization/cplex-optimizer/

  21. Kalinin, A., Cetintemel, U., Zdonik, S.: Interactive data exploration using semantic windows. In: SIGMOD, pp. 505–516 (2014)

  22. Kalinin, A., Çetintemel, U., Zdonik, S.B.: Searchlight: enabling integrated search and exploration over large multidimensional data. PVLDB 8(10), 1094–1105 (2015)

    Google Scholar 

  23. Kanellakis, P.C., Kuper, G.M., Revesz, P.Z.: Constraint query languages. J. Comput. Syst. Sci. 1(51), 26–52 (1995)

    Article  MathSciNet  Google Scholar 

  24. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis, vol. 344. Wiley, Hoboken (2009)

  25. Laporte, M., Novelli, N., Cicchetti, R., Lakhal, L.: Computing full and iceberg datacubes using partitions. In: Foundations of Intelligent Systems, 13th International Symposium, ISMIS 2002, Lyon, France, June 27–29, 2002, Proceedings, pp. 244–254 (2002)

  26. Lappas, T., Liu, K., Terzi, E.: Finding a team of experts in social networks. In: SIGKDD, pp. 467–476 (2009)

  27. Meliou, A., Suciu, D.: Tiresias: the database oracle for how-to queries. In: SIGMOD, pp. 337–348 (2012)

  28. Mirzasoleiman, B., Karbasi, A., Sarkar, R., Krause, A.: Identifying representative elements in massive data. In: NIPS, Distributed Submodular maximization (2013)

  29. Ng, R.T., Wagner, A.S., Yin, Y.: Iceberg-cube computation with PC clusters. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, Santa Barbara, CA, USA, May 21–24, 2001, pp. 25–36 (2001)

  30. Padberg, M., Rinaldi, G.: A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Rev. 33(1), 60–100 (1991)

    Article  MathSciNet  Google Scholar 

  31. Parameswaran, A.G., Venetis, P., Garcia-Molina, H.: Recommendation systems with complex constraints: a course recommendation perspective. ACM TOIS 29(4), 1–33 (2011)

    Article  Google Scholar 

  32. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  33. Pinel, F., Varshney, L.R.: Computational creativity for culinary recipes. In: CHI, pp. 439–442 (2014)

  34. The Sloan Digital Sky Survey. http://www.sdss.org/

  35. The TPC-H Benchmark. http://www.tpc.org/tpch/

  36. Williamson, D.P., Shmoys, D.B.: The Design of Approximation Algorithms. Cambridge University Press, Cambridge (2011)

    Book  Google Scholar 

  37. Xie, M., Lakshmanan, L.V.S., Wood, P.T.: Breaking out of the box of recommendations: from items to packages. In: Proceedings of the 2010 ACM Conference on Recommender Systems, RecSys 2010, Barcelona, Spain, Sep 26–30, 2010, pp. 151–158 (2010)

  38. Xie, M., Lakshmanan, L.V.S., Wood, P.T.: Composite recommendations: from items to packages. Front. Comput. Sci. 6(3), 264–277 (2012)

    MathSciNet  MATH  Google Scholar 

  39. Xie, M., Lakshmanan, L.V.S., Wood, P.T.: Generating top-k packages via preference elicitation. PVLDB 7(14), 1941–1952 (2014)

    Google Scholar 

Download references

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grants IIS-1420941, IIS-1421322, and IIS-1453543.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matteo Brucato.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 194 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Brucato, M., Abouzied, A. & Meliou, A. Package queries: efficient and scalable computation of high-order constraints. The VLDB Journal 27, 693–718 (2018). https://doi.org/10.1007/s00778-017-0483-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-017-0483-4

Keywords

Navigation