Advertisement

The Journal of Supercomputing

, Volume 62, Issue 3, pp 1362–1384 | Cite as

Distributed high dimensional indexing for k-NN search

  • Hyun-Hwa Choi
  • Mi-Young Lee
  • Kyu-Chul LeeEmail author
Article
  • 321 Downloads

Abstract

Although conventional index structures provide various nearest-neighbor search algorithms for high-dimensional data, there are additional requirements to increase search performances, as well as to support index scalability for large-scale datasets. To support these requirements, we propose a distributed high-dimensional index structure based on cluster systems, called a Distributed Vector Approximation-tree (DVA-tree), which is a two-level structure consisting of a hybrid spill-tree and Vector Approximation files (VA-files). We also describe the algorithms used for constructing the DVA-tree over multiple machines and performing distributed k-nearest neighbors (NN) searches. To evaluate performances of the DVA-tree, we conduct an experimental study using both real and synthetic datasets. The results show that our proposed method has significant performance advantages over existing index structures on different kinds of dataset.

Keywords

High-dimensional indexing Distributed indexing Approximate k-NN query Cluster system 

Notes

Acknowledgements

This work was supported by the IT R&D program of MKE/KEIT. [10038768, The Development of Supercomputing System for the Genome Analysis].

References

  1. 1.
    Nikos K, Christos F, Ibrahim K (1996) Declustering spatial databases on a multi-computer architecture. In: Proceedings of the international conference on extending database technology. LNCS, vol 1057, pp 592–614 Google Scholar
  2. 2.
    Bernd S, Scott TL (1999) Master-client R-trees: a new parallel R-tree architecture. In: Proceedings of the international conference on scientific and statistical database management, pp 68–77 Google Scholar
  3. 3.
    Ting L, Charles R, Henry AR (2007) Clustering billions of images with large scale nearest neighbor search. In: Proceedings of the IEEE workshop on applications of computer vision, pp 28–33 Google Scholar
  4. 4.
    Roger W, Klemens B, Hans JS (2000) Interactive-time similarity search for large image collection using parallel VA-files. In: Proceedings of the European conference on research and advanced technology for digital libraries. LNCS, vol 1923, pp 83–92 CrossRefGoogle Scholar
  5. 5.
    Jaewoo C, Ahreum L (2008) Parallel high-dimensional index structure for content-based information retrieval. In: Proceedings of the IEEE international conference on computer and information technology, pp 101–106 CrossRefGoogle Scholar
  6. 6.
    Chi Z, Arvind K, Randolph YW (2004) SkipIndex: towards a scalable peer-to-peer index service for high dimensional data. Technical report TR-703-04, Princeton University Google Scholar
  7. 7.
    Beomseok N, Alan S (2005) DiST: fully decentralized indexing for querying distributed multidimensional datasets. Technical report CS-TR-4720 and UMIACS-TR-2005-28, Maryland University Google Scholar
  8. 8.
    Jagadish HV, Beng CO, Quang HV, Rong Z, Aoying Z (2006) VBI-tree: a peer-to-peer framework for supporting multi-dimensional indexing schemes. In: Proceedings of the international conference on data engineering, p 34. doi: 10.1109/ICDE.2006.169 Google Scholar
  9. 9.
    Mayank B, Tyson C, Prasanna G (2005) LSH forest: self-tuning indexes for similarity search. In: Proceedings of the international conference on world wide web, pp 353–366 Google Scholar
  10. 10.
    Parisa H, Sebastian M, Philippe C-M, Karl A (2008) LSH at large-distributed KNN search in high dimensions. In: Proceedings of the international workshop on the web and databases Google Scholar
  11. 11.
    Roger W, Hans JS, Stephen B (1998) A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Proceedings of the international conference on very large data bases, pp 194–205 Google Scholar
  12. 12.
    Roger W, Stephen B (1997) An approximation-based data structure for similarity search. Technical report 24, ESPRIT project HERMES (no 9141) Google Scholar
  13. 13.
    John TR (1981) The K-D-B-tree: a search structure for large multidimensional dynamic indexes. In: Proceedings of the international ACM SIGMOD conference. doi: 10.1145/582318.582321 Google Scholar
  14. 14.
    David BL, Betty S (1989) A robust multi-attribute search structure. In: Proceedings of the IEEE international conference on data engineering, pp 296–304 Google Scholar
  15. 15.
    Norbert B, Hans PK (1990) The R-tree: an efficient and robust access method for points and rectangles. In: Proceedings of the international ACM SIGMOD conference, pp 322–331 Google Scholar
  16. 16.
    Stefan B, Daniel AK, Hans PK (1996) The X-tree: an index structure for high-dimensional data. In: Proceedings of the international conference on very large data bases, pp 28–39 Google Scholar
  17. 17.
    Paolo C, Marco P, Pavel Z (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the international conference on very large data bases, pp 426–435 Google Scholar
  18. 18.
    Ting L, Andrew WM, Alexander G, Ke Y (2004) An investigation of practical approximate nearest neighbor algorithms. In: Proceedings of the international conference on neural information processing systems, pp 825–832 Google Scholar
  19. 19.
    Christian B, Hans PK (2000) Dynamically optimizing high-dimensional index structures. In: Proceedings of the international conference on extending database technology. LNCS, vol 1777, pp 36–50 Google Scholar
  20. 20.
    Guang HC, Xiaoming Z, Dragutin P, Chin WC (2002) An efficient indexing method for nearest neighbor searches in high-dimensional image databases. IEEE Trans Multimed 4(1):76–87 CrossRefGoogle Scholar
  21. 21.
    Sung GH, Jae WC (2000) A new high-dimensional index structure using a cell-based filtering technique. In: Proceedings of the international conference on database systems for advanced applications. LNCS, vol 1884, pp 79–92 Google Scholar
  22. 22.
    Aristides G, Piotr I, Rajeev M (1999) Similarity search in high dimensions via hashing. In: Proceedings of the international conference on very large data bases, pp 518–529 Google Scholar
  23. 23.
    Edith C, Mayur D, Shinji F, Aristides G, Piotr I, Rajeev M, Jeffrey DU, Cheng Y (2000) Finding interesting associations without support pruning. In: Proceedings of the IEEE international conference on data engineering, pp 64–78 Google Scholar
  24. 24.
    Taro Y (1976) Statistics: an introductory analysis Google Scholar
  25. 25.
    Paolo C, Marco P, Pavel Z (1998) A cost model for similarity queries in metric spaces. In: Proceedings of the Australasian database conference, pp 65–76 Google Scholar
  26. 26.
  27. 27.
  28. 28.

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Electronics and Telecommunications Research InstituteDaejeonRep. of Korea
  2. 2.Chungnam National UniversityDaejeonRep. of Korea

Personalised recommendations