Advertisement

The VLDB Journal

, Volume 23, Issue 4, pp 591–607 | Cite as

Instance-level worst-case query bounds on R-trees

  • Yufei TaoEmail author
  • Yi Yang
  • Xiaocheng Hu
  • Cheng Sheng
  • Shuigeng Zhou
Regular Paper
  • 467 Downloads

Abstract

Even with its significant impacts on the database area, the R-tree is often criticized by its lack of good worst-case guarantees. For example, in range search (where we want to report all the data points in a query rectangle), it is known that on adversely designed datasets and queries, an R-tree can be as slow as a sequential scan that simply reads all the data points. Nevertheless, R-trees work so well on real data that they have been widely implemented in commercial systems. This stark contrast has caused long-term controversy between practitioners and theoreticians as to whether this structure deserves its fame. This paper provides theoretical evidence that, somewhat surprisingly, R-trees are efficient in the worst case for range search on many real datasets. Given any integer \(K\), we explain how to obtain an upper bound on the cost of answering all (i.e., infinitely many) range queries retrieving at most \(K\) objects. On practical data, the upper bound is only a fraction of the overhead of sequential scan (unless, apparently, \(K\) is at the same order as the dataset size). Our upper bounds are tight up to a constant factor, namely they cannot be lowered by more than \(O(1)\) times while still capturing the most expensive queries. Our upper bounds can be calculated in constant time by remembering only three integers. These integers, in turn, are generated from only the leaf MBRs of an R-tree, but not the leaf nodes themselves. In practice, the internal nodes are often buffered in memory, so that the integers aforementioned can be efficiently maintained along with the data updates and made available to a query optimizer at any time. Furthermore, our analytical framework introduces instance-level query bound as a new technique for evaluating the efficiency of heuristic structures in a theory-flavored manner (previously, experimentation was the dominant assessment method).

Keywords

R-tree Performance analysis Theory Instance-level worst-case bound 

Notes

Acknowledgments

The research of Yufei Tao, Xiaocheng Hu, and Cheng Sheng was supported by Grants GRF 4165/11, GRF 4164/12, and GRF 4168/13 from HKRGC. The research of Yi Yang and Shuigeng Zhou was supported by Research Innovation Program of Shanghai Municipal Education Committee under Grant No. 13ZZ003.

References

  1. 1.
    Arge, L., de Berg, M., Haverkort, H.J., Yi, K.: The priority R-tree: a practically efficient and worst-case optimal R-tree. In: Proceedings of ACM Management of Data (SIGMOD), pp. 347–358 (2004)Google Scholar
  2. 2.
    Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of ACM Management of Data (SIGMOD), pp. 322–331 (1990)Google Scholar
  3. 3.
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM (CACM) 18(9), 509–517 (1975)CrossRefzbMATHGoogle Scholar
  4. 4.
    Faloutsos, C., Kamel, I.: Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension. In: Proceedings of ACM Symposium on Principles of Database Systems (PODS), pp. 4–13 (1994)Google Scholar
  5. 5.
    Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of ACM Management of Data (SIGMOD), pp. 47–57 (1984) Google Scholar
  6. 6.
    Hellerstein, J.M., Koutsoupias, E., Miranker, D.P., Papadimitriou, C.H., Samoladas, V.: On a model of indexability and its bounds for range queries. JACM 49(1), 35–55 (2002)Google Scholar
  7. 7.
    Hjaltason, G.R., Samet, H.: Distance browsing in spatial databases. ACM Trans. Datab. Syst. 24(2), 265–318 (1999)CrossRefGoogle Scholar
  8. 8.
    Kamel, I., Faloutsos, C.: Hilbert R-tree: an improved R-tree using fractals. In: Proceedings of Very Large Data Bases (VLDB), pp. 500–509 (1994)Google Scholar
  9. 9.
    Kanth, K.V.R., Singh, A.K.: Optimal dynamic range searching in non-replicating index structures. In: Proceedings of International Conference on Database Theory (ICDT), pp. 257–276 (1999)Google Scholar
  10. 10.
    Leutenegger, S.T., Edgington, J.M., Lopez, M.A.: STR: A simple and efficient algorithm for R-tree packing. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 497–506 (1997)Google Scholar
  11. 11.
    Pagel, B.-U., Six, H.-W., Toben, H., Widmayer, P.: Towards an analysis of range query performance in spatial data structures. In: Proceedings of ACM Symposium on Principles of Database Systems (PODS), pp. 214–221 (1993)Google Scholar
  12. 12.
    Papadias, D., Tao, Y., Fu, G., Seeger, B.: Progressive skyline computation in database systems. ACM Trans. Datab. Syst. 30(1), 41–82 (2005)CrossRefGoogle Scholar
  13. 13.
    Procopiuc, O., Agarwal, P.K., Arge, L., Vitter, J.S.: Bkd-tree: A dynamic scalable kd-tree. In: Proceedings of Symposium on Advances in Spatial and Temporal Databases (SSTD), pp. 46–65 (2003)Google Scholar
  14. 14.
    Robinson, J.T.: The K-D-B-tree: a search structure for large multidimensional dynamic indexes. In: Proceedings of ACM Management of Data (SIGMOD), pp. 10–18 (1981)Google Scholar
  15. 15.
    Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: Proceedings of ACM Management of Data (SIGMOD), pp. 71–79 (1995)Google Scholar
  16. 16.
    Sellis, T.K., Roussopoulos, N., Faloutsos, C.: The R+-tree: A dynamic index for multi-dimensional objects. In: Proceedings of Very Large Data Bases (VLDB), pp. 507–518 (1987)Google Scholar
  17. 17.
    Sleator, D.D., Tarjan, R.E.: Amortized efficiency of list update and paging rules. CACM 28(2), 202–208 (1985)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Tao, Y., Papadias, D.: Performance analysis of R*-trees with arbitrary node extents. IEEE Trans. Knowl. Data Eng. 16(6), 653–668 (2004)CrossRefGoogle Scholar
  19. 19.
    Theodoridis, Y., Sellis, T.K.: A model for the prediction of R-tree performance. In: Proceedings of ACM Symposium on Principles of Database Systems (PODS), pp. 161–171 (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Yufei Tao
    • 1
    Email author
  • Yi Yang
    • 2
  • Xiaocheng Hu
    • 1
  • Cheng Sheng
    • 1
  • Shuigeng Zhou
    • 2
  1. 1.Chinese University of Hong KongSha Tin, New TerritoriesHong Kong
  2. 2.Fudan UniversityShanghaiChina

Personalised recommendations