Abstract
The quest for processing data in high-dimensional space has resulted in a number of innovative indexing mechanisms. Choosing an appropriate indexing method for a given set of data requires careful consideration of data properties, data construction methods, and query types. We present a new indexing method to support efficient point queries, range queries, and k-nearest neighbor queries. Our method indexes objects dynamically using algebraic techniques, and it can substantially reduce the negative impacts of the “curse of dimensionality”. In particular, our method partitions the data space recursively into hypercubes of certain capacity and labels each hypercube using the Cantor pairing function, so that all objects in the same hypercube have the same label. The bijective property and the computational efficiency of the Cantor pairing function make it possible to efficiently map between high-dimensional vectors and scalar labels. The partitioning and labeling process splits a subspace if the data items contained in it exceed its capacity. From the data structure point of view, our method constructs a tree where each parent node contains a number of labels and child pointers, and we call it a PL-tree. We compare our method with popular indexing algorithms including R*-tree, X-tree, quad-tree, and iDistance. Our numerical results show that the dynamic PL-tree indexing significantly outperforms the existing indexing mechanisms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Arge, L., de Berg, M., Haverkort, H.J., Yi, K.: The priority R-tree: A practically efficient and worst-case optimal R-tree. In: Proceedings of ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), pp. 347–358 (2004)
Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: An efficient and robust access method for points and rectangles. In: Proceedings of ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), pp. 322–331 (1990)
Berchtold, S., Böhm, C., Kriegel, H.-P.: The pyramid-technique: Towards breaking the curse of dimensionality. In: Proceedings of ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), pp. 142–153 (1998)
Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: An index structure for high-dimensional data. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 28–39 (1996)
Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)
Cantor, G.: Contributions to the Founding of the Theory of Transfinite Numbers. Dover, New York (1955); Original year was 1915
Ciaccia, P., Patella, M., Zezula, P.: M-tree: An efficient access method for similarity search in metric spaces. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 426–435 (1997)
Corral, A., Cañadas, J., Vassilakopoulos, M.: Processing distance-based queries in multidimensional data spaces using r-trees. In: Manolopoulos, Y., Evripidou, S., Kakas, A.C. (eds.) PCI 2001. LNCS, vol. 2563, pp. 1–18. Springer, Heidelberg (2003)
Fonseca, M.J., Jorge, J.A.: Indexing high-dimensional data for content-based retrieval in large databases. In: Proceedings of International Conference on Database Systems for Advanced Applications (DASFAA), pp. 267–274 (2003)
Gaede, V., Günther, O.: Multidimensional access methods. ACM Comput. Surv. 30(2), 170–231 (1998)
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proceedings of ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), pp. 47–57 (1984)
Hjaltason, G.R., Samet, H.: Distance browsing in spatial databases. ACM Trans. Database Syst. 24(2), 265–318 (1999)
Hoel, E.G., Samet, H., Tree, R.: Benchmarking spatial join operations with spatial output. In: Proceedings of the 21st International Conference on Very Large Data Bases, pp. 606–618 (1998)
Jagadish, H.V., Ooi, B.C., Tan, K.-L., Yu, C., Zhang, R.: idistance: An adaptive b+-tree based indexing method for nearest neighbor search. ACM Trans. Database Syst. 30, 364–397 (2005)
Kamel, I., Faloutsos, C.: Hilbert R-tree: An improved R-tree using fractals. In: VLDB, pp. 500–509 (1994)
Katayama, N., Satoh, S.: The SR-tree: An index structure for high-dimensional nearest neighbor queries. In: Proceedings of ACM/SIGMOD Annual Conference on Management of Data (SIGMOD), pp. 369–380 (1997)
Kim, Y.J., Patel, J.: Performance comparison of the r*-tree and the quadtree for knn and distance join queries. IEEE Transactions on Knowledge and Data Engineering 22(7), 1014–1027 July
Kothuri, R.K.V., Ravada, S., Abugov, D.: Quadtree and r-tree indexes in oracle spatial: a comparison using gis data. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD 2002, pp. 546–557. ACM, New York (2002)
Leutenegger, S., Lopez, M., Edgington, J.: Str: a simple and efficient algorithm for r-tree packing. In: Proceedings of the13th International Conference on Data Engineering, pp. 497–506 (April 1997)
Lin, K.-I., Jagadish, H.V., Faloutsos, C.: The TV-tree: An index structure for high-dimensional data. VLDB Journal 3(4), 517–542 (1994)
Nievergelt, J., Hinterberger, H., Sevcik, K.C.: The grid file: An adaptable, symmetric multikey file structure. ACM Trans. Database Syst. 9(1), 38–71 (1984)
Ooi, B.C., Tan, K.-L., Yu, C., Bressan, S.: Indexing the edges - a simple and yet efficient approach to high-dimensional indexing. In: Proceedings of the Nineteenth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2000, pp. 166–174. ACM, New York (2000)
Sakurai, Y., Yoshikawa, M., Uemura, S., Kojima, H.: The A-tree: An index structure for high-dimensional spaces using relative approximation. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 516–526 (2000)
Samet, H., Webber, R.E.: Storing a collection of polygons using quadtrees. ACM Trans. Graph. 4(3), 182–222 (1985)
Sellis, T.K., Roussopoulos, N., Faloutsos, C.: The R+-tree: A dynamic index for multi-dimensional objects. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 507–518 (1987)
Shimazaki, H., Shinomoto, S.: Kernel bandwidth optimization in spike rate estimation. Journal of Computational Neuroscience 29(1-2), 171–182 (2010)
Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 194–205 (1998)
White, D.A., Jain, R.: Similarity indexing with the SS-tree. In: Proceedings of International Conference on Data Engineering (ICDE), pp. 516–523 (1996)
Yianilos, P.N.: Data structures and algorithms for nearest neighbor search in general metric spaces. In: Proceedings of Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 311–321 (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, J., Lu, J., Fang, Z., Ge, T., Chen, C. (2013). PL-Tree: An Efficient Indexing Method for High-Dimensional Data. In: Nascimento, M.A., et al. Advances in Spatial and Temporal Databases. SSTD 2013. Lecture Notes in Computer Science, vol 8098. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40235-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-40235-7_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40234-0
Online ISBN: 978-3-642-40235-7
eBook Packages: Computer ScienceComputer Science (R0)