# CRB-Tree: An Efficient Indexing Scheme for Range-Aggregate Queries

## Abstract

We propose a new indexing scheme, called the CRB-tree, for efficiently answering range-aggregate queries. The range-aggregate problem is defined as follows: Given a set of weighted points in R^{ d }, compute the aggregate of weights of points that lie inside a *d*-dimensional query rectangle. In this paper we focus on range-COUNT, SUM, AVG aggregates. First, we develop an indexing scheme for answering two-dimensional range-COUNT queries that uses*O(N/B)* disk blocks and answers a query in *O*(log*N* _{B}) I/O_{s}, where *N* is the number of input points and *B* is the disk block size. This is the first optimal index structure for the 2D range- COUNT problem. The index can be extended to obtain a near-linear-size structure for answering range-SUM queries using *O*(log*N* _{B}) I/O_{s}.We also obtain similar bounds for rectangle-intersection aggregate queries, in which the input is a set of weighted rectangles and a query asks to compute the aggregate of the weights of those input rectangles that overlap with the query rectangle. This result immediately improves a recent result on temporal-aggregate queries. Our indexing scheme can be dynamized and extended to higher dimensions. Finally, we demonstrate the practical efficiency of our index by comparing its performance against kdB-tree. For a dataset of around 100 million points, the CRB-tree query time is 8-10 times faster than the kdB-tree query time. Furthermore, unlike other indexing schemes, the query performance of CRB-tree is oblivious to the distribution of the input points and placement, shape and size of the query rectangle.

## Preview

Unable to display preview. Download preview PDF.

### References

- 1.P. K. Agarwal and J. Erickson. Geometric range searching and its relatives. In B. Chazelle, J. E. Goodman, and R. Pollack, editors,
*Advances in Discrete and Computational Geometry, volume 223 of Contemporary Mathematics*, pages 1–56. American Mathematical Society, Providence, RI, 1999.Google Scholar - 2.A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems.
*Commun. ACM*, 31:1116–1127, 1988.CrossRefMathSciNetGoogle Scholar - 3.L. Arge. External memory data structures. In J. Abello, P. M. Pardalos, and M. G. C. Resende, editors,
*Handbook of Massive Data Sets*, pages 313–358. Kluwer Academic Publishers, 2002.Google Scholar - 4.L. Arge, O. Procopiuc, and J. S. Vitter. Implementing I/O-efficient data structures using TPIE. In
*Proc. 10th Annual European Symposium on Algorithms*, pages 88–100, 2002.Google Scholar - 5.L. Arge and J. Vahrenhold. I/O efficient dynamic planar point location. In
*Proc. ACM Symp. on Computational Geometry*, pages 191–200, 2000.Google Scholar - 6.C. Y. Chan and Y. E. Ioannidis. Hierarchical cubes for range-sum queries. In
*Proc. of 25th International Conference on Very Large DataBases*, pages 675–686, 1999.Google Scholar - 7.B. Chazelle. A functional approach to data structures and its use in multidimensional searching.
*SIAM J. Comput.*, 17(3):427–462, June 1988.MATHCrossRefMathSciNetGoogle Scholar - 8.B. Chazelle. Lower bounds for orthogonal range searching, II: The arithmetic model.
*J. ACM*, 37:439–463, 1990.MATHCrossRefMathSciNetGoogle Scholar - 9.H. Edelsbrunner and M. H. Overmars. On the equivalence of some rectangle problems.
*Information Processing Letters*, 14(3):124–128, 1982.MATHCrossRefMathSciNetGoogle Scholar - 10.V. Gaede and O. Günther. Multidimensional access methods.
*ACM Comput. Surv.*, 30:170–231, 1998.CrossRefGoogle Scholar - 11.S. Geffner, D. Agarwal, and A. E. Abbadi. The dynamic datacube. In
*Proc of Intl. Conference on Extending Database Technology*, pages 237–253, 2000.Google Scholar - 12.V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. In
*Proc. of ACM SIGMOD Intl. conference on Management of Data*, pages 205–216, 1996.Google Scholar - 13.J. Kim, S. Kang, and M. Kim. Effective temporal aggregation using point-based trees. In
*Database and Expert Systems Applications*, pages 1018–1030, 1999.Google Scholar - 14.N. Kline and R. T. Snodgrass. Computing temporal aggregates. In
*Proc. of Intl conference on Data Engineering*, pages 222–231, 1995.Google Scholar - 15.S. Lee, W. Ling, and H. Li. Hierarchical compact cubes for range-max queries. In
*Proc of 26th International Conference on Very Large DataBases*, pages 232–241, 2000.Google Scholar - 16.J. Nievergelt and P. Widmayer. Spatial data structures: Concepts and design choices. In J.-R. Sack and J. Urrutia, editors,
*Handbook of Computational Geometry*, pages 725–764. Elsevier Science Publishers B.V. North-Holland, Amsterdam, 2000.CrossRefGoogle Scholar - 17.J. Robinson. The k-d-b tree: A search structure for large multidimensional dynamic indices. In
*Proc. of SIGMOD Conference on Management of Data*, pages 10–18, 1981.Google Scholar - 18.Y. Tao, D. Papadias, and J. Zhang. Aggregate processing of planar points. In
*Extending Database Technology*, pages 682–700, 2002.Google Scholar - 20.D. E. Vengroff.Atransparent parallel I/O environment. In
*Proc.DAGS Symposium on Parallel Computation*, 1994.Google Scholar - 21.J. Yang and J. Widom. Incremental computation and maintenance of temporal aggregates. In
*Proceedings of the 17th International Conference on Data Engineering*, pages 51–60, 2001.Google Scholar - 22.D. Zhang, A. Markowetz, V. Tsotras, D. Gunopulos, and B. Seeger. Efficient computation of temporal aggregates with range predicates. In
*Proc. Principles Of Database Systems*, pages 237–245, 2001.Google Scholar