Abstract
A classical problem in query optimization is to find the optimal ordering of a set of possibly correlated selections. We provide an abstraction of this problem as a generalization of set cover called pipelined set cover, where the sets are applied sequentially to the elements to be covered and the elements covered at each stage are discarded. We show that several natural heuristics for this NP-hard problem, such as the greedy set-cover heuristic and a local-search heuristic, can be analyzed using a linear-programming framework. These heuristics lead to efficient algorithms for pipelined set cover that can be applied to order possibly correlated selections in conventional database systems as well as data-stream processing systems. We use our linear-programming framework to show that the greedy and local-search algorithms are 4-approximations for pipelined set cover. We extend our analysis to minimize the l p -norm of the costs paid by the sets, where p ≥ 2 is an integer, to examine the improvement in performance when the total cost has increasing contribution from initial sets in the pipeline. Finally, we consider the online version of pipelined set cover and present a competitive algorithm with a logarithmic performance guarantee. Our analysis framework may be applicable to other problems in query optimization where it is important to account for correlations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proc. of the 2002 ACM Symp. on Principles of Database Systems, pp. 1–16 (June 2002)
Babu, S., Motwani, R., Munagala, K., Nishizawa, I., Widom, J.: Adaptive ordering of pipelined stream filters. In: Proc. of the 2004 ACM SIGMOD Intl. Conf. on Management of Data (2004)
Chaudhuri, S., Motwani, R., Narasayya, V.: Random sampling for histogram construction: How much is enough? In: Proc. of the 1998 ACM SIGMOD Intl. Conf. on Management of Data, pp. 436–447 (June 1998)
Chaudhuri, S., Shim, K.: Optimization of queries with user-defined predicates. ACM Transactions on Database Systems 24(2), 177–228 (1999)
Chen, J., DeWitt, D., Tian, F., Wang, Y.: NiagaraCQ: A scalable continuous query system for internet databases. In: Proc. of the 2000 ACM SIGMOD Intl. Conf. on Management of Data, pp. 379–390 (May 2000)
Christodoulakis, S.: Implications of certain assumptions in database performance evaluation. ACM Transactions on Database Systems 9(2), 163–186 (1984)
Cohen, E., Fiat, A., Kaplan, H.: Efficient sequences of trials. In: Proc. of the 2003 Annual ACM-SIAM Symp. on Discrete Algorithms (2003)
Cormode, G., Datar, M., Indyk, P., Muthukrishnan, S.: Comparing data streams using hamming norms (how to zero in). In: Proc. of the 2002 Intl. Conf. on Very Large Data Bases, pp. 335–345 (August 2002)
Cranor, C., Johnson, T., Spataschek, O., Shkapenyuk, V.: Gigascope: A stream database for network applications. In: Proc. of the 2003 ACM SIGMOD Intl. Conf. on Management of Data, pp. 647–651 (June 2003)
Feige, U.: A threshold of ln n for approximating set cover. Journal of the ACM 45, 634–652 (1998)
Feige, U., Lovász, L., Tetali, P.: Approximating min-sum set cover. Algorithmica (2004)
Hellerstein, J.: Optimization techniques for queries with expensive methods. ACM Transactions on Database Systems 23(2), 113–157 (1998)
Hochbaum, D. (ed.): Approximation Algorithms for NP-Hard Problems. PWS Publishing Company, Boston (1997)
Jain, K., Mahdian, M., Saberi, A.: A new greedy approach for facility location problems. In: Proc. of the 2002 Annual ACM Symp. on Theory of Computing (May 2002)
Johnson, D.: Approximation algorithms for combinatorial problems. Journal of Computer and System Sciences 9, 256–278 (1974)
Kalai, A., Vempala, S.: Efficient algorithms for the online decision problem. In: Proc. of 16th Conf. on Computational Learning Theory (2003)
Kemper, A., Moerkotte, G., Steinbrunn, M.: Optimizing boolean expressions in object-bases. In: Proc. of the 1992 Intl. Conf. on Very Large Data Bases, pp. 79–90 (August 1992)
Krishnamurthy, R., Boral, H., Zaniolo, C.: Optimization of nonrecursive queries. In: Proc. of the 1986 Intl. Conf. on Very Large Data Bases, pp. 128–137 (August 1986)
Ling, Y., Sun, W.: An evaluation of sampling-based size estimation methods for selections in database systems. In: Proc. of the 1995 Intl. Conf. on Data Engineering, pp. 532–539 (March 1995)
Motwani, R., Widom, J., et al.: Query processing, resource management, and approximation in a data stream management system. In: Proc. of the 2003 Conf. on Innovative Data Systems Research, pp. 245–256 (January 2003)
Munagala, K., Babu, S., Motwani, R., Widom, J.: The pipelined set cover problem. Stanford University Database Group Technical Report 2003-65 (2003)
Reinwald, L., Soland, R.: Conversion of limited-entry decision tables to optimal computer programs I: Minimum average processing time. Journal of the ACM 13(3), 339–358 (1966)
Ross, K.: Conjunctive selection conditions in main memory. In: Proc. of the 2002 ACM Symp. on Principles of Database Systems (June 2002)
Srinivasan, A.: Improved approximations of packing and covering problems. In: Proc. of the 1995 Annual ACM Symp. on Theory of Computing, pp. 268–276 (June 1995)
Stillger, M., Lohman, G., Markl, V., Kandil, M.: LEO - DB2’s LEarning Optimizer. In: Proc. of the 2001 Intl. Conf. on Very Large Data Bases, pp. 9–28 (September 2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Munagala, K., Babu, S., Motwani, R., Widom, J. (2004). The Pipelined Set Cover Problem. In: Eiter, T., Libkin, L. (eds) Database Theory - ICDT 2005. ICDT 2005. Lecture Notes in Computer Science, vol 3363. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30570-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-30570-5_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24288-8
Online ISBN: 978-3-540-30570-5
eBook Packages: Computer ScienceComputer Science (R0)