# On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity and Applications

## Abstract

Partitioning a multi-dimensional data set into rectangular partitions subject to certain constraints is an important problem that arises in many database applications, including histogram-based selectivity estimation, load-balancing, and construction of index structures. While provably optimal and efficient algorithms exist for partitioning one-dimensional data, the multi-dimensional problem has received less attention, except for a few special cases. As a result, the heuristic partitioning techniques that are used in practice are not well understood, and come with no guarantees on the quality of the solution. In this paper, we present algorithmic and complexity-theoretic results for the fundamental problem of partitioning a two-dimensional array into rectangular tiles of arbitrary size in a way that minimizes the number of tiles required to satisfy a given constraint. Our main results are approximation algorithms for several partitioning problems that provably approximate the optimal solutions within small constant factors, and that run in linear or close to linear time. We also establish the NP-hardness of several partitioning problems, therefore it is unlikely that there are efficient, i.e., polynomial time, algorithms for solving these problems *exactly*.

We also discuss a few applications in which partitioning problems arise. One of the applications is the problem of constructing multi-dimensional histograms. Our results, for example, give an efficient algorithm to construct the *V-Optimal* histograms which are known to be the most accurate histograms in several selectivity estimation problems. Our algorithms are the first to provide guaranteed bounds on the quality of the solution.

## Keywords

Partitioning Problem Database Application Cumulative Function Hierarchical Partition Block Match Algorithm## Preview

Unable to display preview. Download preview PDF.

## References

- 1.S. Anily and A. Federgruen. Structured partitioning problems.
*Operations Research*, 13, 130–149, 1991.MathSciNetCrossRefGoogle Scholar - 2.S. Arora. Polynomial time approximation schemes for euclidean tsp and other geometric problems.
*Proc 37th IEEE Symp. of Foundations of Computer Science (FOCS)*, pages 2–12, 1996.Google Scholar - 3.S. Bokhari. Partitioning problems in parallel, pipelined, and distributed computing.
*IEEE Transactions on Computers*, 37, 38–57, 1988.CrossRefMathSciNetGoogle Scholar - 4.Brönnimann and Goodrich. Almost optimal set covers in finite VC-dimension. In
*Proceedings of the 10th Annual Symposium on Computational Geometry*, 1994.Google Scholar - 5.B. Carpentieri and J. Storer. A split-merge parallel block matching algorithmGoogle Scholar
- 6.M. Charikar, C. Chekuri, T. Feder, and R. Motwani. Personal communication, 1996.Google Scholar
- 7.K. L. Clarkson. A Las Vegas algorithm for linear programming when the dimension is small. In
*Proc. 29th Annual IEEE Symposium on Foundations of Computer Science*, pages 452–456, October 1988.Google Scholar - 8.F. d’Amore and P. Franciosa. On the optimal binary plane partition for sets of isothetic rectangles.
*Information Proc. Letters*, 44, 255–259, 1992.zbMATHCrossRefMathSciNetGoogle Scholar - 9.R. Fowler, M. Paterson, and S. Tanimoto. Optimal packing and covering in the plane are np-complete.
*Information Proc. Letters*, 12, 133–137, 1981.zbMATHCrossRefMathSciNetGoogle Scholar - 10.G. Fox, M. Johnson, G. Lyzenga, S. Otto, J. Salmon, and D. Walker.
*Solving Problems on Concurrent Processors*, volume 1. Prentice-Hall, Englewood Cliffs, New Jersey, 1988.Google Scholar - 11.M. Grigni and F. Manne. On the complexity of the generalized block distribution. Proc. of 3rd international workshop on parallel algorithms for irregularly structured problems (IRREGULAR’ 96), Lecture notes in computer science 1117, Springer, 319–326, 1996.CrossRefGoogle Scholar
- 12.D. Haussler and E. Welzl. Epsilon-nets and simplex range queries.
*Discrete and Computational Geometry*, 2:127–151, 1987.zbMATHCrossRefMathSciNetGoogle Scholar - 13.Y. Ioannidis. Universality of serial histograms.
*Proc. of the 19th Int. Conf. on Very Large Databases*, pages 256–267, December 1993.Google Scholar - 14.Y. Ioannidis and V. Poosala. Balancing histogram optimality and practicality for query result size estimation.
*Proc. of ACM SIGMOD Conf*, pages 233–244, May 1995.Google Scholar - 15.H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. Sevcik, and T. Suel. Optimal histograms with quality guarantees.
*Proc. of the 24rd Int. Conf. on Very Large Databases*, pages 275–286, August 1998.Google Scholar - 16.J. Jain and A. Jain. Displacement measurement and its application in interframe coding.
*IEEE Transactions on communications*, 29, 1799–1808, 1981.CrossRefGoogle Scholar - 17.M. Kaddoura, S. Ranka and A. Wang. Array decomposition for nonuniform computational environments.
*Technical Report*, Syracuse University, 1995.Google Scholar - 18.S. Khanna, S. Muthukrishnan, and M. Paterson. Approximating rectangle tiling and packing.
*Proc Symp. on Discrete Algorithms (SODA)*, pages 384–393, 1998.Google Scholar - 19.S. Khanna, S. Muthukrishnan, and S. Skiena. Efficient array partitioning.
*Proc. Intl. Colloq. on Automata, Languages, and Programming (ICALP)*, pages 616–626, 1997.Google Scholar - 20.R. P. Kooi.
*The optimization of queries in relational databases*. PhD thesis, Case Western Reserve University, Sept 1980.Google Scholar - 21.D. Lichtenstein. Planar formulae and their uses.
*SIAM J. Computing*, 11, 329–343, 1982.zbMATHCrossRefMathSciNetGoogle Scholar - 22.N. Littlestone. Learning quickly when irrelevant attributes abound: A new linearthreshold algorithm. In
*Proceedings of the 28th Annual Symposium on Foundations of Computer Science*, pages 68–77, October 1987.Google Scholar - 23.F. Manne.
*Load Balancing in Parallel Sparse Matrix Computations*. Ph.d. thesis, Department of Informatics, University of Bergen, Norway, 1993.Google Scholar - 24.F. Manne and T. Sorevik. Partitioning an array onto a mesh of processors. Proc. of
*Workshop on Applied Parallel Computing in Industrial Problems*. 1996.Google Scholar - 25.C. Manning.
*Introduction to Digital Video Coding and Block Matching Algorithms*. http://atlantis.ucc.ie/dvideo/dv.html. - 26.J. Mitchell. Guillotine subdivisions approximate polygonal subdivisions: A simple method for geometric k-mst problem.
*Proc. ACM-SIAM Symp. on Discrete Algorithms (SODA)*, pages 402–408, 1996.Google Scholar - 27.M. Muralikrishna and David J Dewitt. Equi-depth histograms for estimating selectivity factors for multi-dimensional queries.
*Proc. of ACM SIGMOD Conf*, pages 28–36, 1988.Google Scholar - 28.J. Nievergelt, H. Hinterberger, and K. C. Sevcik. The grid file: An adaptable, symmetric multikey file structure.
*ACM Transactions on Database Systems*, 9(1):38–71, March 1984.CrossRefGoogle Scholar - 29.S. Muthukrishnan, V. Poosala and T. Suel. On rectangular partitionings in two dimensions: algorithms, complexity and applications.
*Manuscript*, 1998.Google Scholar - 30.V. Poosala, Y. Ioannidis, P. Haas, and E. Shekita. Improved histograms for selectivity estimation of range predicates.
*Proc. of ACMSIGMOD Conf*, pages 294–305, June 1996.Google Scholar - 31.V. Poosala.
*Histogram-based estimation techniques in databases*. PhD thesis, Univ. of Wisconsin-Madison, 1997.Google Scholar - 32.V. Poosala and Y. Ioannidis. Selectivity estimation without the attribute value independence assumption.
*Proc. of the 23rd Int. Conf. on Very Large Databases*, August 1997.Google Scholar - 33.G. P. Shapiro and C. Connell. Accurate estimation of the number of tuples satisfying a condition.
*Proc. of ACM SIGMOD Conf*, pages 256–276, 1984.Google Scholar - 34.E. Welzl. Partition trees for triangle counting and other range searching problems. In
*Proceedings of the 4th Annual Symposium on Computational Geometry*, pages 23–33, June 1988.Google Scholar