Skip to main content

The Pipelined Set Cover Problem

  • Conference paper
Database Theory - ICDT 2005 (ICDT 2005)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3363))

Included in the following conference series:

Abstract

A classical problem in query optimization is to find the optimal ordering of a set of possibly correlated selections. We provide an abstraction of this problem as a generalization of set cover called pipelined set cover, where the sets are applied sequentially to the elements to be covered and the elements covered at each stage are discarded. We show that several natural heuristics for this NP-hard problem, such as the greedy set-cover heuristic and a local-search heuristic, can be analyzed using a linear-programming framework. These heuristics lead to efficient algorithms for pipelined set cover that can be applied to order possibly correlated selections in conventional database systems as well as data-stream processing systems. We use our linear-programming framework to show that the greedy and local-search algorithms are 4-approximations for pipelined set cover. We extend our analysis to minimize the l p -norm of the costs paid by the sets, where p ≥ 2 is an integer, to examine the improvement in performance when the total cost has increasing contribution from initial sets in the pipeline. Finally, we consider the online version of pipelined set cover and present a competitive algorithm with a logarithmic performance guarantee. Our analysis framework may be applicable to other problems in query optimization where it is important to account for correlations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proc. of the 2002 ACM Symp. on Principles of Database Systems, pp. 1–16 (June 2002)

    Google Scholar 

  2. Babu, S., Motwani, R., Munagala, K., Nishizawa, I., Widom, J.: Adaptive ordering of pipelined stream filters. In: Proc. of the 2004 ACM SIGMOD Intl. Conf. on Management of Data (2004)

    Google Scholar 

  3. Chaudhuri, S., Motwani, R., Narasayya, V.: Random sampling for histogram construction: How much is enough? In: Proc. of the 1998 ACM SIGMOD Intl. Conf. on Management of Data, pp. 436–447 (June 1998)

    Google Scholar 

  4. Chaudhuri, S., Shim, K.: Optimization of queries with user-defined predicates. ACM Transactions on Database Systems 24(2), 177–228 (1999)

    Article  Google Scholar 

  5. Chen, J., DeWitt, D., Tian, F., Wang, Y.: NiagaraCQ: A scalable continuous query system for internet databases. In: Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data, pp. 379–390 (May 2000)

    Google Scholar 

  6. Christodoulakis, S.: Implications of certain assumptions in database performance evaluation. ACM Transactions on Database Systems 9(2), 163–186 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  7. Cohen, E., Fiat, A., Kaplan, H.: Efficient sequences of trials. In: Proc. of the 2003 Annual ACM-SIAM Symp. on Discrete Algorithms (2003)

    Google Scholar 

  8. Cormode, G., Datar, M., Indyk, P., Muthukrishnan, S.: Comparing data streams using hamming norms (how to zero in). In: Proc. of the 2002 Intl. Conf. on Very Large Data Bases, pp. 335–345 (August 2002)

    Google Scholar 

  9. Cranor, C., Johnson, T., Spataschek, O., Shkapenyuk, V.: Gigascope: A stream database for network applications. In: Proc. of the 2003 ACM SIGMOD Intl. Conf. on Management of Data, pp. 647–651 (June 2003)

    Google Scholar 

  10. Feige, U.: A threshold of ln n for approximating set cover. Journal of the ACM 45, 634–652 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  11. Feige, U., Lovász, L., Tetali, P.: Approximating min-sum set cover. Algorithmica (2004)

    Google Scholar 

  12. Hellerstein, J.: Optimization techniques for queries with expensive methods. ACM Transactions on Database Systems 23(2), 113–157 (1998)

    Article  MathSciNet  Google Scholar 

  13. Hochbaum, D. (ed.): Approximation Algorithms for NP-Hard Problems. PWS Publishing Company, Boston (1997)

    Google Scholar 

  14. Jain, K., Mahdian, M., Saberi, A.: A new greedy approach for facility location problems. In: Proc. of the 2002 Annual ACM Symp. on Theory of Computing (May 2002)

    Google Scholar 

  15. Johnson, D.: Approximation algorithms for combinatorial problems. Journal of Computer and System Sciences 9, 256–278 (1974)

    Article  MATH  MathSciNet  Google Scholar 

  16. Kalai, A., Vempala, S.: Efficient algorithms for the online decision problem. In: Proc. of 16th Conf. on Computational Learning Theory (2003)

    Google Scholar 

  17. Kemper, A., Moerkotte, G., Steinbrunn, M.: Optimizing boolean expressions in object-bases. In: Proc. of the 1992 Intl. Conf. on Very Large Data Bases, pp. 79–90 (August 1992)

    Google Scholar 

  18. Krishnamurthy, R., Boral, H., Zaniolo, C.: Optimization of nonrecursive queries. In: Proc. of the 1986 Intl. Conf. on Very Large Data Bases, pp. 128–137 (August 1986)

    Google Scholar 

  19. Ling, Y., Sun, W.: An evaluation of sampling-based size estimation methods for selections in database systems. In: Proc. of the 1995 Intl. Conf. on Data Engineering, pp. 532–539 (March 1995)

    Google Scholar 

  20. Motwani, R., Widom, J., et al.: Query processing, resource management, and approximation in a data stream management system. In: Proc. of the 2003 Conf. on Innovative Data Systems Research, pp. 245–256 (January 2003)

    Google Scholar 

  21. Munagala, K., Babu, S., Motwani, R., Widom, J.: The pipelined set cover problem. Stanford University Database Group Technical Report 2003-65 (2003)

    Google Scholar 

  22. Reinwald, L., Soland, R.: Conversion of limited-entry decision tables to optimal computer programs I: Minimum average processing time. Journal of the ACM 13(3), 339–358 (1966)

    Article  MATH  Google Scholar 

  23. Ross, K.: Conjunctive selection conditions in main memory. In: Proc. of the 2002 ACM Symp. on Principles of Database Systems (June 2002)

    Google Scholar 

  24. Srinivasan, A.: Improved approximations of packing and covering problems. In: Proc. of the 1995 Annual ACM Symp. on Theory of Computing, pp. 268–276 (June 1995)

    Google Scholar 

  25. Stillger, M., Lohman, G., Markl, V., Kandil, M.: LEO - DB2’s LEarning Optimizer. In: Proc. of the 2001 Intl. Conf. on Very Large Data Bases, pp. 9–28 (September 2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Munagala, K., Babu, S., Motwani, R., Widom, J. (2004). The Pipelined Set Cover Problem. In: Eiter, T., Libkin, L. (eds) Database Theory - ICDT 2005. ICDT 2005. Lecture Notes in Computer Science, vol 3363. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30570-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30570-5_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24288-8

  • Online ISBN: 978-3-540-30570-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics