Interactive-Time Similarity Search for Large Image Collections Using Parallel VA-Files
In digital libraries, nearest-neighbor search (NN-search) plays a key role for content-based retrieval over multimedia objects. However, performance of existing NN-search techniques is not satisfactory with large collections and with high-dimensional representations of the objects. To obtain response times that are interactive, we pursue the following approach: it uses a linear algorithm that works with approximations of the vectors and parallelizes it. In more detail, we parallelize NN-search based on the VA-File in a Network of Workstations (NOW). This approach reduces search time to a reasonable level for large collections. The best speedup we have observed is by almost 30 for a NOW with only three components with 900 MB of feature data. But this requires a number of design decisions, in particular when taking load dynamism and heterogeneity of components into account. Our contribution is to address these design issues.
KeywordsDigital Library Main Memory Search Time Feature Data Search Cost
Unable to display preview. Download preview PDF.
- 1.S. Berchtold, C. Böhm, B. Braunmüller, D.A. Keim, and H.-P. Kriegel. Fast parallel similarity search in multimedia databases. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 1–12, Tucson, USA, 1997.Google Scholar
- 2.S. Berchtold, D.A. Keim, and H.-P. Kriegel. The X-tree: An index structure forhigh-dimensional data. In Proc. of the Int. Conference on Very Large Databases, pages 28–39, 1996.Google Scholar
- 3.Jakob Bosshard. An open and powerful relevance feedback engine for content-basedimage-retrieval. Diploma thesis (in english), Institute of Information Systems, ETH, Zurich, 2000.Google Scholar
- 4.P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. In Proc. of the Int. Conference on Very Large Databases, Greece, 1997.Google Scholar
- 5.P. Ciaccia, P. Tiberio, and P. Zezula. Declustering of key-based partitioned signature files. ACM Transactions on Database Systems, 21(3), September 1996.Google Scholar
- 8.A. Guttman. R-trees: A dynamic index structure for spatial searching. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 47–57, Boston, MA, June 1984.Google Scholar
- 9.I. Kamel and C. Faloutsos. Parallel R-trees. Technical Report CS-TR-2820, University of Maryland Institute for Advanced Computer Studies Dept. of Computer Science, Univ. of Maryland, College Park, MD, January 6 1992.Google Scholar
- 11.G. Panagopoulos and C. Faloutsos. Bit-sliced signature files for very large text databases on a parallel machine architecture. Lecture Notes in Computer Science, 779, 1994.Google Scholar
- 12.A. N. Papadopoulos and Y. Manolopoulos. Similarity query processing using disk arrays. SIGMOD Record (ACM Special Interest Group on Management of Data), 27(2), 1998.Google Scholar
- 13.H.-J. Schek and R. Weber. Higher-Order Databases and Multimedia Information. In Proc. of the Swiss/Japan Seminar “Advances in Databases and Multimedia for the New Century-A Swiss/Japanese Perspective”, Kyoto, Japan, December 1-2, 1999, Singapore, 2000. World Scientific Press.Google Scholar
- 14.Columbia University. Webseek: A content-based image and video search and catalog tool for the web. http://disney.ctr.columbia.edu/webseek/.
- 15.R. Weber and K. Böhm. Trading Quality for Time with Nearest-Neighbor Search. In Advances in Database Technology EDBT 2000, Proc. of the 7th Int. Conf. on Extending Database Technology, Konstanz, Germany, March 2000, volume 1777 of Lecture Notes in Computer Science, pages 21–35, Berlin, 2000. Springer-Verlag.Google Scholar
- 16.R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proc. of the Int. Conference on Very Large Databases, New York, USA, August 1998.Google Scholar
- 17.Roger Weber, Jürg Bolliger, Thomas Gross, and Hans-J. Schek. Architecture of a networked image search and retrieval system. In Eighth International Conference on Information and Knowledge Management (CIKM99), Kansas City, Missouri, USA, November 2–6 1999.Google Scholar