Abstract
We propose a file structure to index high-dimensionality data, which are typically points in some feature space. The idea is to use only a few of the features, using additional features only when the additional discriminatory power is absolutely necessary. We present in detail the design of our tree structure and the associated algorithms that handle such “varying length” feature vectors. Finally, we report simulation results, comparing the proposed structure with theR *-tree, which is one of the most successful methods for low-dimensionality spaces.The results illustrate the superiority of our method, which saves up to 80% in disk accesses.
Similar content being viewed by others
References
Agrawal, R., Faloutsos, C., and Swami, A. Efficient similarity search in sequence databases.FODO Conference, Evanston, IL, 1993.
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. A basic local alignment tool.Journal of Molecular Biology 215(13):403–410, 1990.
Angell, R.C., Freund, G.E., and Willet, P. Automatic spelling correction using a trigram similarity measure.Information Processing and Management, 19(4):255–261, 1983.
Arya, M., Cody, W., Faloutsos, C., Richardson, J., and Toga, A. Qbism: A prototype 3-D medical image database system.IEEE Data Engineering Bulletin, 16(1):38–42, 1993.
Aurenhammer, F. Voronoi diagrams: A survey of a fundamental geometric data structure.ACM Computing Surveys, 23(3):345–405, 1991.
Beckmann, N., Kriegel, H.-P., Schneider, R., and Seeger, B. The R*-tree: An efficient and robust access method for points and rectangles.ACM SIGMOD, Atlantic City, NJ, 1990.
Bentley, J.L., Weide, B.W., and Yao, A.C. Optimal expected-time algorithms for closest-point problems.ACM Transactions on Mathematical Software, 6(4):563–580, 1980.
Brinkhoff, T., Kriegel, H.-P., and Seeger, B. Efficient processing of spatial joins usingR-trees.Proceedings of the ACM SIGMOD, Washington, DC, 1993.
Chatfield, C.The Analysis of Time Series: An Introduction. London: Chapman and Hall, 1984. Third edition.
Friedman, J.H., Baskett, F., and Shustek, L.H. An algorithm for finding nearest neighbors.IEEE Transactions on Computers, C-24(10):1000–1006, 1975.
Fukunaga, K.Introduction to Statistical Pattern Recognition. New York: Academic Press, 1990.
Fukunaga, K. and Narendra, P.M. A branch and bound algorithm for computing k-nearest neighbors.IEEE Transactions on Computers, C-24(7):750–753, 1975.
Greene, D. An implementation and performance analysis of spatial data access methods.Proceedings of Data Engineering, Boston, MA, 1989.
Guttman, A. R-trees: A dynamic index structure for spatial searching.Proceedings of the ACM SIGMOD, 1984.
Hamming, R.W.Digital Filters. Englewood Cliffs, NJ: Prentice-Hall, 1977.
Hartigan, J.A.Clustering algorithms. New York: John Wiley & Sons, 1975.
Hoel, E.G. and Samet, H. A qualitative comparison study of data structures for large line segment databases.Proceedings of the ACM SIGMOD Conference, San Diego, CA, 1992.
Hunter, G.M. and Steiglitz, K. Operations on images using quad trees.IEEE Transactions on PAMI, 1(2):145–153 (1979).
Jagadish, H.V. Spatial search with polyhedra.Proceedings of the Sixth IEEE International Conference on Data Engineering, Los Angeles, CA, 1990.
Jagadish, H.V. A retrieval technique for similar shapes.Proceedings of the ACM SIGMOD Conference, Denver, CO, 1991.
Kamel, I. and Faloutsos, C. HilbertR-tree: An improvedR-tree using fractals. Systems Research Center (SRC) TR-93-19, University of Maryland, College Park, MD, 1993.
Kukich, K. Techniques for automatically correcting words in text.ACM Computing Surveys, 24(4):377–440, 1992.
Mandelbrot, B.Fractal Geometry of Nature. New York: W.H. Freeman, 1977.
Murtagh, F. A survey of recent advances in hierarchical clustering algorithms.The Computer Journal, 26(4):354–359, 1983.
Narasimhalu, A.D. and Christodoulakis, S. Multimedia information systems: The unfolding of a reality.IEEE Computer, 24(10):6–8, 1991.
Niblack, W., Barber, R., Equitz, W., Flickner, M., Glasman, E., Petkovic, D., Yanker, P., Faloutsos, C., and Taubin, G. The qbic project: Querying images by content using color, texture, and shape.SPIE 1993 International Symposium on Electronic Imaging: Science and Technology Conference 1908, Storage and Retrieval for Image and Video Databases, San Jose, CA, 1993. Also available as IBM Research Report RJ 9203 (81511), 1993.
Nievergelt, J., Hinterberger, H., and Sevcik, K.C. The grid file: An adaptable, symmetric, multikey file structure.ACM TODS, 9(1):38–71, 1984.
Orenstein, J.A. and Manola, F.A. Probe spatial data modeling and query processing in an image database application.IEEE Transactions on Software Engineering, 14(5):611–629, 1988.
Ruskai, M.B., Beylkin, G., Coifman, R., Daubechies, I., Mallat, S., Meyer, Y., and Raphael, L.Wavelets and Their Applications. Boston: Jones and Bartlett Publishers, 1992.
Salton, G. and Wong, A. Generation and search of clustered files.ACM TODS, 3(4):321–346, 1978.
Samet, H..The Design and Analysis of Spatial Data Structures. Reading, MA: Addison-Wesley, 1989.
Schroeder, M.:Fractals, Chaos, Power Laws: Minutes From an Infinite Paradise. New York: W.H. Freeman and Company, 1991.
Wallace, G.K. The jpeg still picture compression standard.CACM, 34(4):31–44, 1991.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Lin, KI., Jagadish, H.V. & Faloutsos, C. The TV-tree: An index structure for high-dimensional data. VLDB Journal 3, 517–542 (1994). https://doi.org/10.1007/BF01231606
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF01231606