EBS k-d Tree: An Entropy Balanced Statistical k-d Tree for Image Databases with Ground-Truth Labels
In this paper we present a new image database indexing structure — Entropy Balanced Statistical (EBS) k-d Tree. This indexing mechanism utilizes the statistical properties and ground-truth labeling of image data for efficient and accurate searches. It is particularly valuable in the domains of medical and biological image database retrieval, where ground-truth labeling are available and archived with the images. The EBS k-d tree is an extension to the statistical k-d tree that attempts to optimize a multi-dimensional decision tree based on the fundamental principles from which it is constructed. Our approach is to develop and validate the notion of an entropy balanced statistical based decision tree. It is shown that by making balanced split decisions in the growth processing of the tree, that the average search depth is improved and the worst case search depth is usually dramatically improved. Furthermore, a method for linking the tree leaves into a non-linear structure was developed to increase the n-nearest neighbor similarity search accuracy. We have applied this to a large-scale medical diagnostic image database and have shown increases in search speed and accuracy over an ordinary distance-based search and the original statistical k-d tree index.
KeywordsStatistical k-d Tree entropy multi-dimensional index image database content-based image retrieval (CBIR)
Unable to display preview. Download preview PDF.
- S. Arya, D.M. Mount, N. S. Netanyahu, R. Silverman, and A.Y. Wu, An optimal algorithm for approximate nearest neighbor searching in fixed dimensions, Proc. ACM-SIAM Symp. on Discrete Alg., 1994: 573–582.Google Scholar
- R. Baeza-Yates and B. Ribeiro-Neto, Modern Information Retrieval, 1st Edition, 1999, ACM Press/ Addison Wesley.Google Scholar
- J. L. Bentley, Multidimensional binary search trees in database applications, in IEEE Trans. on Software Engineering, Vol. SE-5, No. 4, July 1979.Google Scholar
- D. Comer, The ubiquitous B-Tree,” in Computing Surveys, Vol. 11, No. 2, June 1979.Google Scholar
- M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, Query by image and video content: The QBIC system, IEEE Computer, September 1995: 23–32.Google Scholar
- K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, 1990.Google Scholar
- R.A. Johnson and D.W. Wichern, Applied Multivariate Statistical Analysis, Fourth Edition, Prentice Hall, 1998.Google Scholar
- P.M. Kelly, T.M. Cannon, and D.R. Hush, Query by image example: the CANDID approach, in SPIE Vol. 2420 Storage and Retrieval for Image and Video Databases III, San Jose, CA, 1995: 238–248.Google Scholar
- J. S. Milton and J.C. Arnold, Introduction to Probability and Statistics: Principles and Applications for Engineering and Computing Sciences, 3rd Edition, Irwin McGraw-Hill, 1995.Google Scholar
- S. Sclaroff, L. Taycher, and M. La Cascia, ImageRover: A content-based image browser for the world wide web, Proc. IEEE Workshop on Content-based Access of Image and Video Libraies, June 1997: 2–9.Google Scholar
- J.R. Smith, Image Retrieval Evaluation, Proc. IEEE Workshop of Content-Based Access of Image and Video Databases, Santa Barbara, CA, June 1998: 112–113.Google Scholar
- S. Theodoridis and K. Koutroumbas, Pattern Recognition, Academic Press, 1999.Google Scholar