Abstract
Traditional database queries follow a simple model: they define constraints that each tuple in the result must satisfy. This model is computationally efficient, as the database system can evaluate the query conditions on each tuple individually. However, many practical, real-world problems require a collection of result tuples to satisfy constraints collectively, rather than individually. In this paper, we present package queries, a new query model that extends traditional database queries to handle complex constraints and preferences over answer sets. We develop a full-fledged package query system, implemented on top of a traditional database engine. Our work makes several contributions. (1) We design PaQL, a SQL-based query language that supports the declarative specification of package queries. We prove that PaQL is at least as expressive as integer linear programming, and therefore, evaluation of package queries is NP-hard. (2) We present a fundamental evaluation strategy that combines the capabilities of databases and constraint optimization solvers to derive solutions to package queries. The core of our approach is a set of translation rules that transform a package query to an integer linear program. (3) We introduce an offline data partitioning strategy allowing query evaluation to scale to large data sizes. (4) We introduce SketchRefine, a scalable algorithm for package evaluation, with strong approximation guarantees [\((1 \pm \varepsilon )\)-factor approximation]. (5) We present a method for parallelizing the Refine phase of SketchRefine. (6) We present an empirical study of the efficiency gains of providing integer solvers with starting solutions. (7) We present extensive experiments over real-world and benchmark data. The results demonstrate that our methods are effective at deriving high-quality package results and achieve runtime performance that is an order of magnitude faster than directly using ILP solvers over large datasets.
Similar content being viewed by others
Notes
The evaluation of non-recursive SQL queries is polynomial with respect to data complexity. When we discuss complexity in this paper, we refer to data complexity.
This syntax slightly differs from the one presented in [6].
For ease of presentation, we show an ILP with nonnegative variables, but the mapping generalizes to arbitrary integer variables: negative variables negate the corresponding values in the query; for arbitrary bounds on each variable, add cardinality constraints to individual tuples.
References
Basu Roy, S., Amer-Yahia, S., Chawla, A., Das, G., Yu, C.: Constructing and exploring composite items. In: SIGMOD, pp 843–854 (2010)
Baykasoglu, A., Dereli, T., Das, S.: Project team selection using fuzzy optimization approach. Cybern. Syst. 38(2), 155–185 (2007)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Bisschop, J.: AIMMS Optimization Modeling. Paragon Decision Technology, Haarlem (2006)
Bonatti, P., Calimeri, F., Leone, N., Ricca, F.: A 25-year perspective on logic programming. Chapter Answer Set Programming, pp. 159–182. Springer, Berlin (2010)
Brucato, M., Beltran, J.F., Abouzied, A., Meliou, A.: Scalable package queries in relational database systems. PVLDB 9(7), 576–587 (2016)
Brucato, M., Ramakrishna, R., Abouzied, A., Meliou, A.: PackageBuilder: from tuples to packages. PVLDB 7(13), 1593–1596 (2014)
Cook, W., Hartmann, M.: On the complexity of branch and cut methods for the traveling salesman problem. Polyhedral Comb. 1, 75–82 (1990)
De Choudhury, M., Feldman, M., Amer-Yahia, S., Golbandi, N., Lempel, R., Yu, C.: Automatic construction of travel itineraries using social breadcrumbs. In: HyperText, pp. 35–44 (2010)
Deng, T., Fan, W., Geerts, F.: On the complexity of package recommendation problems. In: PODS, pp. 261–272 (2012)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)
Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., Ullman, J.D.: Computing iceberg queries efficiently. In: VLDB’98, Proceedings of 24th International Conference on Very Large Data Bases, Aug 24–27, 1998, New York City, New York, USA, pp. 299–310 (1998)
Finkel, R.A., Bentley, J.L.: Quad trees a data structure for retrieval on composite keys. Acta Inform. 4(1), 1–9 (1974)
Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T.: Clingo = ASP + control: Preliminary report. In: Leuschel, M., Schrijvers, T., (eds.) Technical Communications of the Thirtieth International Conference on Logic Programming (ICLP’14), volume arXiv:1405.3694v1 (2014). Theory and Practice of Logic Programming, Online Supplement
GNU Bison. https://www.gnu.org/software/bison/
Goemans, M.X., Williamson, D.P.: The primal-dual method for approximation algorithms and its application to network design problems. In: Approximation Algorithms for NP-Hard Problems, pp. 144–191 (1997)
Guha, S., Gunopulos, D., Koudas, N., Srivastava, D., Vlachos, M.: Efficient approximation of optimization queries under parametric aggregation constraints. In: VLDB, pp. 778–789 (2003)
Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. Appl. Stat. 28, 100–108 (1979)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
IBM CPLEX Optimization Studio. http://www.ibm.com/software/commerce/optimization/cplex-optimizer/
Kalinin, A., Cetintemel, U., Zdonik, S.: Interactive data exploration using semantic windows. In: SIGMOD, pp. 505–516 (2014)
Kalinin, A., Çetintemel, U., Zdonik, S.B.: Searchlight: enabling integrated search and exploration over large multidimensional data. PVLDB 8(10), 1094–1105 (2015)
Kanellakis, P.C., Kuper, G.M., Revesz, P.Z.: Constraint query languages. J. Comput. Syst. Sci. 1(51), 26–52 (1995)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis, vol. 344. Wiley, Hoboken (2009)
Laporte, M., Novelli, N., Cicchetti, R., Lakhal, L.: Computing full and iceberg datacubes using partitions. In: Foundations of Intelligent Systems, 13th International Symposium, ISMIS 2002, Lyon, France, June 27–29, 2002, Proceedings, pp. 244–254 (2002)
Lappas, T., Liu, K., Terzi, E.: Finding a team of experts in social networks. In: SIGKDD, pp. 467–476 (2009)
Meliou, A., Suciu, D.: Tiresias: the database oracle for how-to queries. In: SIGMOD, pp. 337–348 (2012)
Mirzasoleiman, B., Karbasi, A., Sarkar, R., Krause, A.: Identifying representative elements in massive data. In: NIPS, Distributed Submodular maximization (2013)
Ng, R.T., Wagner, A.S., Yin, Y.: Iceberg-cube computation with PC clusters. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, Santa Barbara, CA, USA, May 21–24, 2001, pp. 25–36 (2001)
Padberg, M., Rinaldi, G.: A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Rev. 33(1), 60–100 (1991)
Parameswaran, A.G., Venetis, P., Garcia-Molina, H.: Recommendation systems with complex constraints: a course recommendation perspective. ACM TOIS 29(4), 1–33 (2011)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Pinel, F., Varshney, L.R.: Computational creativity for culinary recipes. In: CHI, pp. 439–442 (2014)
The Sloan Digital Sky Survey. http://www.sdss.org/
The TPC-H Benchmark. http://www.tpc.org/tpch/
Williamson, D.P., Shmoys, D.B.: The Design of Approximation Algorithms. Cambridge University Press, Cambridge (2011)
Xie, M., Lakshmanan, L.V.S., Wood, P.T.: Breaking out of the box of recommendations: from items to packages. In: Proceedings of the 2010 ACM Conference on Recommender Systems, RecSys 2010, Barcelona, Spain, Sep 26–30, 2010, pp. 151–158 (2010)
Xie, M., Lakshmanan, L.V.S., Wood, P.T.: Composite recommendations: from items to packages. Front. Comput. Sci. 6(3), 264–277 (2012)
Xie, M., Lakshmanan, L.V.S., Wood, P.T.: Generating top-k packages via preference elicitation. PVLDB 7(14), 1941–1952 (2014)
Acknowledgements
This material is based upon work supported by the National Science Foundation under Grants IIS-1420941, IIS-1421322, and IIS-1453543.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Brucato, M., Abouzied, A. & Meliou, A. Package queries: efficient and scalable computation of high-order constraints. The VLDB Journal 27, 693–718 (2018). https://doi.org/10.1007/s00778-017-0483-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-017-0483-4