Advertisement

The VLDB Journal

, Volume 27, Issue 5, pp 693–718 | Cite as

Package queries: efficient and scalable computation of high-order constraints

  • Matteo Brucato
  • Azza Abouzied
  • Alexandra Meliou
Special Issue Paper
  • 178 Downloads

Abstract

Traditional database queries follow a simple model: they define constraints that each tuple in the result must satisfy. This model is computationally efficient, as the database system can evaluate the query conditions on each tuple individually. However, many practical, real-world problems require a collection of result tuples to satisfy constraints collectively, rather than individually. In this paper, we present package queries, a new query model that extends traditional database queries to handle complex constraints and preferences over answer sets. We develop a full-fledged package query system, implemented on top of a traditional database engine. Our work makes several contributions. (1) We design PaQL, a SQL-based query language that supports the declarative specification of package queries. We prove that PaQL is at least as expressive as integer linear programming, and therefore, evaluation of package queries is NP-hard. (2) We present a fundamental evaluation strategy that combines the capabilities of databases and constraint optimization solvers to derive solutions to package queries. The core of our approach is a set of translation rules that transform a package query to an integer linear program. (3) We introduce an offline data partitioning strategy allowing query evaluation to scale to large data sizes. (4) We introduce SketchRefine, a scalable algorithm for package evaluation, with strong approximation guarantees [\((1 \pm \varepsilon )\)-factor approximation]. (5) We present a method for parallelizing the Refine phase of SketchRefine. (6) We present an empirical study of the efficiency gains of providing integer solvers with starting solutions. (7) We present extensive experiments over real-world and benchmark data. The results demonstrate that our methods are effective at deriving high-quality package results and achieve runtime performance that is an order of magnitude faster than directly using ILP solvers over large datasets.

Keywords

Package queries Integer linear programming Approximation algorithm SketchRefine PaQL 

Notes

Acknowledgements

This material is based upon work supported by the National Science Foundation under Grants IIS-1420941, IIS-1421322, and IIS-1453543.

Supplementary material

778_2017_483_MOESM1_ESM.pdf (195 kb)
Supplementary material 1 (pdf 194 KB)

References

  1. 1.
    Basu Roy, S., Amer-Yahia, S., Chawla, A., Das, G., Yu, C.: Constructing and exploring composite items. In: SIGMOD, pp 843–854 (2010)Google Scholar
  2. 2.
    Baykasoglu, A., Dereli, T., Das, S.: Project team selection using fuzzy optimization approach. Cybern. Syst. 38(2), 155–185 (2007)CrossRefGoogle Scholar
  3. 3.
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)CrossRefGoogle Scholar
  4. 4.
    Bisschop, J.: AIMMS Optimization Modeling. Paragon Decision Technology, Haarlem (2006)Google Scholar
  5. 5.
    Bonatti, P., Calimeri, F., Leone, N., Ricca, F.: A 25-year perspective on logic programming. Chapter Answer Set Programming, pp. 159–182. Springer, Berlin (2010)Google Scholar
  6. 6.
    Brucato, M., Beltran, J.F., Abouzied, A., Meliou, A.: Scalable package queries in relational database systems. PVLDB 9(7), 576–587 (2016)Google Scholar
  7. 7.
    Brucato, M., Ramakrishna, R., Abouzied, A., Meliou, A.: PackageBuilder: from tuples to packages. PVLDB 7(13), 1593–1596 (2014)Google Scholar
  8. 8.
    Cook, W., Hartmann, M.: On the complexity of branch and cut methods for the traveling salesman problem. Polyhedral Comb. 1, 75–82 (1990)MathSciNetzbMATHGoogle Scholar
  9. 9.
    De Choudhury, M., Feldman, M., Amer-Yahia, S., Golbandi, N., Lempel, R., Yu, C.: Automatic construction of travel itineraries using social breadcrumbs. In: HyperText, pp. 35–44 (2010)Google Scholar
  10. 10.
    Deng, T., Fan, W., Geerts, F.: On the complexity of package recommendation problems. In: PODS, pp. 261–272 (2012)Google Scholar
  11. 11.
    Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp. 226–231 (1996)Google Scholar
  12. 12.
    Fang, M., Shivakumar, N., Garcia-Molina, H., Motwani, R., Ullman, J.D.: Computing iceberg queries efficiently. In: VLDB’98, Proceedings of 24th International Conference on Very Large Data Bases, Aug 24–27, 1998, New York City, New York, USA, pp. 299–310 (1998)Google Scholar
  13. 13.
    Finkel, R.A., Bentley, J.L.: Quad trees a data structure for retrieval on composite keys. Acta Inform. 4(1), 1–9 (1974)CrossRefGoogle Scholar
  14. 14.
    Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T.: Clingo = ASP + control: Preliminary report. In: Leuschel, M., Schrijvers, T., (eds.) Technical Communications of the Thirtieth International Conference on Logic Programming (ICLP’14), volume arXiv:1405.3694v1 (2014). Theory and Practice of Logic Programming, Online Supplement
  15. 15.
  16. 16.
    Goemans, M.X., Williamson, D.P.: The primal-dual method for approximation algorithms and its application to network design problems. In: Approximation Algorithms for NP-Hard Problems, pp. 144–191 (1997)Google Scholar
  17. 17.
    Guha, S., Gunopulos, D., Koudas, N., Srivastava, D., Vlachos, M.: Efficient approximation of optimization queries under parametric aggregation constraints. In: VLDB, pp. 778–789 (2003)CrossRefGoogle Scholar
  18. 18.
    Hartigan, J.A., Wong, M.A.: Algorithm as 136: a k-means clustering algorithm. Appl. Stat. 28, 100–108 (1979)CrossRefGoogle Scholar
  19. 19.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)MathSciNetCrossRefGoogle Scholar
  20. 20.
  21. 21.
    Kalinin, A., Cetintemel, U., Zdonik, S.: Interactive data exploration using semantic windows. In: SIGMOD, pp. 505–516 (2014)Google Scholar
  22. 22.
    Kalinin, A., Çetintemel, U., Zdonik, S.B.: Searchlight: enabling integrated search and exploration over large multidimensional data. PVLDB 8(10), 1094–1105 (2015)Google Scholar
  23. 23.
    Kanellakis, P.C., Kuper, G.M., Revesz, P.Z.: Constraint query languages. J. Comput. Syst. Sci. 1(51), 26–52 (1995)MathSciNetCrossRefGoogle Scholar
  24. 24.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis, vol. 344. Wiley, Hoboken (2009)Google Scholar
  25. 25.
    Laporte, M., Novelli, N., Cicchetti, R., Lakhal, L.: Computing full and iceberg datacubes using partitions. In: Foundations of Intelligent Systems, 13th International Symposium, ISMIS 2002, Lyon, France, June 27–29, 2002, Proceedings, pp. 244–254 (2002)Google Scholar
  26. 26.
    Lappas, T., Liu, K., Terzi, E.: Finding a team of experts in social networks. In: SIGKDD, pp. 467–476 (2009)Google Scholar
  27. 27.
    Meliou, A., Suciu, D.: Tiresias: the database oracle for how-to queries. In: SIGMOD, pp. 337–348 (2012)Google Scholar
  28. 28.
    Mirzasoleiman, B., Karbasi, A., Sarkar, R., Krause, A.: Identifying representative elements in massive data. In: NIPS, Distributed Submodular maximization (2013)Google Scholar
  29. 29.
    Ng, R.T., Wagner, A.S., Yin, Y.: Iceberg-cube computation with PC clusters. In: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, Santa Barbara, CA, USA, May 21–24, 2001, pp. 25–36 (2001)Google Scholar
  30. 30.
    Padberg, M., Rinaldi, G.: A branch-and-cut algorithm for the resolution of large-scale symmetric traveling salesman problems. SIAM Rev. 33(1), 60–100 (1991)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Parameswaran, A.G., Venetis, P., Garcia-Molina, H.: Recommendation systems with complex constraints: a course recommendation perspective. ACM TOIS 29(4), 1–33 (2011)CrossRefGoogle Scholar
  32. 32.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  33. 33.
    Pinel, F., Varshney, L.R.: Computational creativity for culinary recipes. In: CHI, pp. 439–442 (2014)Google Scholar
  34. 34.
    The Sloan Digital Sky Survey. http://www.sdss.org/
  35. 35.
    The TPC-H Benchmark. http://www.tpc.org/tpch/
  36. 36.
    Williamson, D.P., Shmoys, D.B.: The Design of Approximation Algorithms. Cambridge University Press, Cambridge (2011)CrossRefGoogle Scholar
  37. 37.
    Xie, M., Lakshmanan, L.V.S., Wood, P.T.: Breaking out of the box of recommendations: from items to packages. In: Proceedings of the 2010 ACM Conference on Recommender Systems, RecSys 2010, Barcelona, Spain, Sep 26–30, 2010, pp. 151–158 (2010)Google Scholar
  38. 38.
    Xie, M., Lakshmanan, L.V.S., Wood, P.T.: Composite recommendations: from items to packages. Front. Comput. Sci. 6(3), 264–277 (2012)MathSciNetzbMATHGoogle Scholar
  39. 39.
    Xie, M., Lakshmanan, L.V.S., Wood, P.T.: Generating top-k packages via preference elicitation. PVLDB 7(14), 1941–1952 (2014)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.College of Information and Computer SciencesUniversity of MassachusettsAmherstUSA
  2. 2.Computer ScienceNew York UniversityAbu DhabiUAE

Personalised recommendations