The Pipelined Set Cover Problem

  • Kamesh Munagala
  • Shivnath Babu
  • Rajeev Motwani
  • Jennifer Widom
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3363)

Abstract

A classical problem in query optimization is to find the optimal ordering of a set of possibly correlated selections. We provide an abstraction of this problem as a generalization of set cover called pipelined set cover, where the sets are applied sequentially to the elements to be covered and the elements covered at each stage are discarded. We show that several natural heuristics for this NP-hard problem, such as the greedy set-cover heuristic and a local-search heuristic, can be analyzed using a linear-programming framework. These heuristics lead to efficient algorithms for pipelined set cover that can be applied to order possibly correlated selections in conventional database systems as well as data-stream processing systems. We use our linear-programming framework to show that the greedy and local-search algorithms are 4-approximations for pipelined set cover. We extend our analysis to minimize the l p -norm of the costs paid by the sets, where p ≥ 2 is an integer, to examine the improvement in performance when the total cost has increasing contribution from initial sets in the pipeline. Finally, we consider the online version of pipelined set cover and present a competitive algorithm with a logarithmic performance guarantee. Our analysis framework may be applicable to other problems in query optimization where it is important to account for correlations.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proc. of the 2002 ACM Symp. on Principles of Database Systems, pp. 1–16 (June 2002)Google Scholar
  2. 2.
    Babu, S., Motwani, R., Munagala, K., Nishizawa, I., Widom, J.: Adaptive ordering of pipelined stream filters. In: Proc. of the 2004 ACM SIGMOD Intl. Conf. on Management of Data (2004)Google Scholar
  3. 3.
    Chaudhuri, S., Motwani, R., Narasayya, V.: Random sampling for histogram construction: How much is enough? In: Proc. of the 1998 ACM SIGMOD Intl. Conf. on Management of Data, pp. 436–447 (June 1998)Google Scholar
  4. 4.
    Chaudhuri, S., Shim, K.: Optimization of queries with user-defined predicates. ACM Transactions on Database Systems 24(2), 177–228 (1999)CrossRefGoogle Scholar
  5. 5.
    Chen, J., DeWitt, D., Tian, F., Wang, Y.: NiagaraCQ: A scalable continuous query system for internet databases. In: Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data, pp. 379–390 (May 2000)Google Scholar
  6. 6.
    Christodoulakis, S.: Implications of certain assumptions in database performance evaluation. ACM Transactions on Database Systems 9(2), 163–186 (1984)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Cohen, E., Fiat, A., Kaplan, H.: Efficient sequences of trials. In: Proc. of the 2003 Annual ACM-SIAM Symp. on Discrete Algorithms (2003)Google Scholar
  8. 8.
    Cormode, G., Datar, M., Indyk, P., Muthukrishnan, S.: Comparing data streams using hamming norms (how to zero in). In: Proc. of the 2002 Intl. Conf. on Very Large Data Bases, pp. 335–345 (August 2002)Google Scholar
  9. 9.
    Cranor, C., Johnson, T., Spataschek, O., Shkapenyuk, V.: Gigascope: A stream database for network applications. In: Proc. of the 2003 ACM SIGMOD Intl. Conf. on Management of Data, pp. 647–651 (June 2003)Google Scholar
  10. 10.
    Feige, U.: A threshold of ln n for approximating set cover. Journal of the ACM 45, 634–652 (1998)MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Feige, U., Lovász, L., Tetali, P.: Approximating min-sum set cover. Algorithmica (2004)Google Scholar
  12. 12.
    Hellerstein, J.: Optimization techniques for queries with expensive methods. ACM Transactions on Database Systems 23(2), 113–157 (1998)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Hochbaum, D. (ed.): Approximation Algorithms for NP-Hard Problems. PWS Publishing Company, Boston (1997)Google Scholar
  14. 14.
    Jain, K., Mahdian, M., Saberi, A.: A new greedy approach for facility location problems. In: Proc. of the 2002 Annual ACM Symp. on Theory of Computing (May 2002)Google Scholar
  15. 15.
    Johnson, D.: Approximation algorithms for combinatorial problems. Journal of Computer and System Sciences 9, 256–278 (1974)MATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Kalai, A., Vempala, S.: Efficient algorithms for the online decision problem. In: Proc. of 16th Conf. on Computational Learning Theory (2003)Google Scholar
  17. 17.
    Kemper, A., Moerkotte, G., Steinbrunn, M.: Optimizing boolean expressions in object-bases. In: Proc. of the 1992 Intl. Conf. on Very Large Data Bases, pp. 79–90 (August 1992)Google Scholar
  18. 18.
    Krishnamurthy, R., Boral, H., Zaniolo, C.: Optimization of nonrecursive queries. In: Proc. of the 1986 Intl. Conf. on Very Large Data Bases, pp. 128–137 (August 1986)Google Scholar
  19. 19.
    Ling, Y., Sun, W.: An evaluation of sampling-based size estimation methods for selections in database systems. In: Proc. of the 1995 Intl. Conf. on Data Engineering, pp. 532–539 (March 1995)Google Scholar
  20. 20.
    Motwani, R., Widom, J., et al.: Query processing, resource management, and approximation in a data stream management system. In: Proc. of the 2003 Conf. on Innovative Data Systems Research, pp. 245–256 (January 2003)Google Scholar
  21. 21.
    Munagala, K., Babu, S., Motwani, R., Widom, J.: The pipelined set cover problem. Stanford University Database Group Technical Report 2003-65 (2003)Google Scholar
  22. 22.
    Reinwald, L., Soland, R.: Conversion of limited-entry decision tables to optimal computer programs I: Minimum average processing time. Journal of the ACM 13(3), 339–358 (1966)MATHCrossRefGoogle Scholar
  23. 23.
    Ross, K.: Conjunctive selection conditions in main memory. In: Proc. of the 2002 ACM Symp. on Principles of Database Systems (June 2002)Google Scholar
  24. 24.
    Srinivasan, A.: Improved approximations of packing and covering problems. In: Proc. of the 1995 Annual ACM Symp. on Theory of Computing, pp. 268–276 (June 1995)Google Scholar
  25. 25.
    Stillger, M., Lohman, G., Markl, V., Kandil, M.: LEO - DB2’s LEarning Optimizer. In: Proc. of the 2001 Intl. Conf. on Very Large Data Bases, pp. 9–28 (September 2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Kamesh Munagala
    • 1
  • Shivnath Babu
    • 2
  • Rajeev Motwani
    • 2
  • Jennifer Widom
    • 2
  1. 1.Computer Science DepartmentDuke University 
  2. 2.Computer Science DepartmentStanford University 

Personalised recommendations