On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity and Applications
Partitioning a multi-dimensional data set into rectangular partitions subject to certain constraints is an important problem that arises in many database applications, including histogram-based selectivity estimation, load-balancing, and construction of index structures. While provably optimal and efficient algorithms exist for partitioning one-dimensional data, the multi-dimensional problem has received less attention, except for a few special cases. As a result, the heuristic partitioning techniques that are used in practice are not well understood, and come with no guarantees on the quality of the solution. In this paper, we present algorithmic and complexity-theoretic results for the fundamental problem of partitioning a two-dimensional array into rectangular tiles of arbitrary size in a way that minimizes the number of tiles required to satisfy a given constraint. Our main results are approximation algorithms for several partitioning problems that provably approximate the optimal solutions within small constant factors, and that run in linear or close to linear time. We also establish the NP-hardness of several partitioning problems, therefore it is unlikely that there are efficient, i.e., polynomial time, algorithms for solving these problems exactly.
We also discuss a few applications in which partitioning problems arise. One of the applications is the problem of constructing multi-dimensional histograms. Our results, for example, give an efficient algorithm to construct the V-Optimal histograms which are known to be the most accurate histograms in several selectivity estimation problems. Our algorithms are the first to provide guaranteed bounds on the quality of the solution.
KeywordsPartitioning Problem Database Application Cumulative Function Hierarchical Partition Block Match Algorithm
Unable to display preview. Download preview PDF.
- 2.S. Arora. Polynomial time approximation schemes for euclidean tsp and other geometric problems. Proc 37th IEEE Symp. of Foundations of Computer Science (FOCS), pages 2–12, 1996.Google Scholar
- 4.Brönnimann and Goodrich. Almost optimal set covers in finite VC-dimension. In Proceedings of the 10th Annual Symposium on Computational Geometry, 1994.Google Scholar
- 5.B. Carpentieri and J. Storer. A split-merge parallel block matching algorithmGoogle Scholar
- 6.M. Charikar, C. Chekuri, T. Feder, and R. Motwani. Personal communication, 1996.Google Scholar
- 7.K. L. Clarkson. A Las Vegas algorithm for linear programming when the dimension is small. In Proc. 29th Annual IEEE Symposium on Foundations of Computer Science, pages 452–456, October 1988.Google Scholar
- 10.G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker. Solving Problems on Concurrent Processors, volume 1. Prentice-Hall, Englewood Cliffs, New Jersey, 1988.Google Scholar
- 13.Y. Ioannidis. Universality of serial histograms. Proc. of the 19th Int. Conf. on Very Large Databases, pages 256–267, December 1993.Google Scholar
- 14.Y. Ioannidis and V. Poosala. Balancing histogram optimality and practicality for query result size estimation. Proc. of ACM SIGMOD Conf, pages 233–244, May 1995.Google Scholar
- 15.H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. Sevcik, and T. Suel. Optimal histograms with quality guarantees. Proc. of the 24rd Int. Conf. on Very Large Databases, pages 275–286, August 1998.Google Scholar
- 17.M. Kaddoura, S. Ranka and A. Wang. Array decomposition for nonuniform computational environments. Technical Report, Syracuse University, 1995.Google Scholar
- 18.S. Khanna, S. Muthukrishnan, and M. Paterson. Approximating rectangle tiling and packing. Proc Symp. on Discrete Algorithms (SODA), pages 384–393, 1998.Google Scholar
- 19.S. Khanna, S. Muthukrishnan, and S. Skiena. Efficient array partitioning. Proc. Intl. Colloq. on Automata, Languages, and Programming (ICALP), pages 616–626, 1997.Google Scholar
- 20.R. P. Kooi. The optimization of queries in relational databases. PhD thesis, Case Western Reserve University, Sept 1980.Google Scholar
- 22.N. Littlestone. Learning quickly when irrelevant attributes abound: A new linearthreshold algorithm. In Proceedings of the 28th Annual Symposium on Foundations of Computer Science, pages 68–77, October 1987.Google Scholar
- 23.F. Manne. Load Balancing in Parallel Sparse Matrix Computations. Ph.d. thesis, Department of Informatics, University of Bergen, Norway, 1993.Google Scholar
- 24.F. Manne and T. Sorevik. Partitioning an array onto a mesh of processors. Proc. of Workshop on Applied Parallel Computing in Industrial Problems. 1996.Google Scholar
- 25.C. Manning. Introduction to Digital Video Coding and Block Matching Algorithms. http://atlantis.ucc.ie/dvideo/dv.html.
- 26.J. Mitchell. Guillotine subdivisions approximate polygonal subdivisions: A simple method for geometric k-mst problem. Proc. ACM-SIAM Symp. on Discrete Algorithms (SODA), pages 402–408, 1996.Google Scholar
- 27.M. Muralikrishna and David J Dewitt. Equi-depth histograms for estimating selectivity factors for multi-dimensional queries. Proc. of ACM SIGMOD Conf, pages 28–36, 1988.Google Scholar
- 29.S. Muthukrishnan, V. Poosala and T. Suel. On rectangular partitionings in two dimensions: algorithms, complexity and applications. Manuscript, 1998.Google Scholar
- 30.V. Poosala, Y. Ioannidis, P. Haas, and E. Shekita. Improved histograms for selectivity estimation of range predicates. Proc. of ACMSIGMOD Conf, pages 294–305, June 1996.Google Scholar
- 31.V. Poosala. Histogram-based estimation techniques in databases. PhD thesis, Univ. of Wisconsin-Madison, 1997.Google Scholar
- 32.V. Poosala and Y. Ioannidis. Selectivity estimation without the attribute value independence assumption. Proc. of the 23rd Int. Conf. on Very Large Databases, August 1997.Google Scholar
- 33.G. P. Shapiro and C. Connell. Accurate estimation of the number of tuples satisfying a condition. Proc. of ACM SIGMOD Conf, pages 256–276, 1984.Google Scholar
- 34.E. Welzl. Partition trees for triangle counting and other range searching problems. In Proceedings of the 4th Annual Symposium on Computational Geometry, pages 23–33, June 1988.Google Scholar