Skip to main content

Improving the query performance of high-dimensional index structures by bulk load operations

  • Conference paper
  • First Online:
Advances in Database Technology — EDBT'98 (EDBT 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1377))

Included in the following conference series:

Abstract

In this paper, we propose a new bulk-loading technique for high-dimensional indexes which represent an important component of multimedia database systems. Since it is very inefficient to construct an index for a large amount of data by dynamic insertion of single objects, there is an increasing interest in bulk-loading techniques. In contrast to previous approaches, our technique exploits a priori knowledge of the complete data set to improve both construction time and query performance. Our algorithm operates in a mannar similar to the Quicksort algorithm and has an average runtime complexity of O(n log n). We additionally improve the query performance by optimizing the shape of the bounding boxes, by completely avoiding overlap, and by clustering the pages on disk. As we analytically show, the split strategy typically used in dynamic index structures, splitting the data space at the 50%-quantile, results in a bad query performance in high-dimensional spaces. Therefore, we use a sophisticated unbalanced split strategy, which leads to a much better space partitioning. An exhaustive experimental evaluation shows that our technique clearly outperforms both classic index construction and competitive bulk loading techniques. In comparison with dynamic index construction we achieve a speed-up factor of up to 588 for the construction time. The constructed index causes up to 16.88 times fewer page accesses and is up to 198 times faster (real time) in query processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berchtold S., Böhm C., Braunmueller B., Keim D. A., Kriegel H.-P.: ‘Fast Similarity Search in Multimedia Databases', Proc. ACM SIGMOD Int. Conf. on Management of Data, 1997, Tucson, Arizona.

    Google Scholar 

  2. Berchtold S., Böhm C., Keim D., Kriegel H.-P.: 'A Cost Model For Nearest Neighbor Search in High-Dimensional Data Space', ACM PODS Symposium on Pricinples of Database Systems, Tucson, Arizona, 1997, SIGMOD BEST PAPER AWARD.

    Google Scholar 

  3. Berchtold S., Kriegel H.-P.: ‘S3: Similarity Search in CAD Database Systems', Proc. ACM SIGMOD Int. Conf. on Management of Data, 1997, Tucson, Arizona.

    Google Scholar 

  4. Berchtold S., Keim D., Kriegel H.-P.: ‘The X-tree: An Index Structure for High-Dimensional Data', 22nd Conf. on Very Large Databases, 1996, Bombay, India, pp. 28–39.

    Google Scholar 

  5. Beckmann N., Kriegel H.-P., Schneider R., Seeger B.: 'The R *-tree: An Efficient and Robust Access Method for Points and Rectangles', Proc. ACM SIGMOD Int. Conf. on Management of Data, Atlantic City, NJ, 1990, pp. 322–331.

    Google Scholar 

  6. van den Bercken J., Seeger B., Widmayer P.:, 'A General Approach to Bulk Loading Multidimensional Index Structures', 23rd Conf. on Very Large Databases, 1997, Athens, Greece.

    Google Scholar 

  7. Faloutsos C., Barber R., Flickner M., Hafner J., et al.: ‘Efficient and Effective Querying by Image Content', Journal of Intelligent Information Systems, 1994, Vol. 3, pp. 231–262.

    Article  Google Scholar 

  8. Friedman J. H., Bentley J. L., Finkel R. A.: ‘An Algorithm for Finding Best Matches in Logarithmic Expected Time', ACM Transactions on Mathematical Software, Vol. 3, No. 3, September 1977, pp. 209–226.

    Article  Google Scholar 

  9. C.A.R. Hoare, ‘Quicksort', Computer Journal, Vol. 5, No. 1, 1962.

    Google Scholar 

  10. Jagadish H. V.: 'A Retrieval Technique for Similar Shapes', Proc. ACM SIGMOD Int. Conf. on Management of Data, 1991, pp. 208–217.

    Google Scholar 

  11. Jain R, White D.A.: ‘Similarity Indexing: Algorithms and Performance', Proc. SPIE Storage and Retrieval for Image and Video Databases IV, Vol. 2670, San Jose, CA, 1996, pp. 62–75.

    Google Scholar 

  12. Kamel I., Faloutsos C.: 'Hilbert R-tree: An Improved R-tree using Fractals'. Proc. 20th Int. Conf. on Very Large Databases (VLDB'94), pp. 500–509

    Google Scholar 

  13. Katayama N., Satoh S.: 'The SR-tree: An Index Structure for High-Dimensional Nearest Neighbor Queries', Proc. ACM SIGMOD Int. Conf. on Management of Data, 1997.

    Google Scholar 

  14. Lin K., Jagadish H. V., Faloutsos C.: ‘The TV-tree: An Index Structure for High-Dimensional Data', VLDB Journal, Vol. 3, pp. 517–542, 1995.

    Article  Google Scholar 

  15. Mehrotra R., Gary J.: 'Feature-Based Retrieval of Similar Shapes', Proc. 9th Int. Conf. on Data Engeneering, April 1993

    Google Scholar 

  16. Robinson J. T.: 'The K-D-B-tree: A Search Structure for Large Multidimensional Dynamic Indexes', Proc. ACM SIGMOD Int. Conf. on Management of Data, 1981, pp. 10–18.

    Google Scholar 

  17. R. Sedgewick: ‘Quicksort', Garland, New York, 1978.

    Google Scholar 

  18. Seidl T., Kriegel H.-P.: 'Efficient User-Adaptable Similarity Search in Large Multimedia Databases', Proc. 23rd Int. Conf. on Very Large Databases (VLDB'97), Athens, Greece, 1997.

    Google Scholar 

  19. White D.A., Jain R.: 'similarity indexing with the SS-tree', Proc. 12th Int. Conf on Data Engineering, New Orleans, LA, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Hans-Jörg Schek Gustavo Alonso Felix Saltor Isidro Ramos

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Berchtold, S., Böhm, C., Kriegel, HP. (1998). Improving the query performance of high-dimensional index structures by bulk load operations. In: Schek, HJ., Alonso, G., Saltor, F., Ramos, I. (eds) Advances in Database Technology — EDBT'98. EDBT 1998. Lecture Notes in Computer Science, vol 1377. Springer, Berlin, Heidelberg . https://doi.org/10.1007/BFb0100987

Download citation

  • DOI: https://doi.org/10.1007/BFb0100987

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64264-0

  • Online ISBN: 978-3-540-69709-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics