Impact of Storage Technology on the Efficiency of Cluster-Based High-Dimensional Index Creation

  • Gylfi Þór Gudmundsson
  • Laurent Amsaleg
  • Björn Þór Jónsson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7240)


The scale of multimedia data collections is expanding at a very fast rate. In order to cope with this growth, the high-dimensional indexing methods used for content-based multimedia retrieval must adapt gracefully to secondary storage. Recent progress in storage technology, however, means that algorithm designers must now cope with a spectrum of secondary storage solutions, ranging from traditional magnetic hard drives to state-of-the-art solid state disks. We study the impact of storage technology on a simple, prototypical high-dimensional indexing method for large scale query processing. We show that while the algorithm implementation deeply impacts the performance of the indexing method, the choice of underlying storage technology is equally important.


Storage Technology Magnetic Disk Average Cluster Size Secondary Storage Query Vector 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51, 117–122 (2008)CrossRefGoogle Scholar
  2. 2.
    Athanassoulis, M., Ailamaki, A., Chen, S., Gibbons, P.B., Stoica, R.: Flash in a dbms: Where and how? IEEE Data Eng. Bull. 33(4), 28–34 (2010)Google Scholar
  3. 3.
    Bonnet, P., Bouganim, L.: Flash device support for database management. In: CIDR, pp. 1–8 (2011),
  4. 4.
    Bouganim, L., Jónsson, B.T., Bonnet, P.: uFLIP: Understanding flash IO patterns. In: Proc. CIDR (2009)Google Scholar
  5. 5.
    Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes, C., Slaney, M.: Content-based music information retrieval: Current directions and future challenges. Proceedings of the IEEE 96(4), 668–696 (2008)CrossRefGoogle Scholar
  6. 6.
    Chierichetti, F., Panconesi, A., Raghavan, P., Sozio, M., Tiberi, A., Upfal, E.: Finding near neighbors through cluster pruning. In: Proc. PODS (2007)Google Scholar
  7. 7.
    Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40, 5:1–5:60 (2008)CrossRefGoogle Scholar
  8. 8.
    Gudmundsson, G., Jónsson, B.T., Amsaleg, L.: A large-scale performance study of cluster-based high-dimensional indexing. In: Proc. ACMMM–Workshop on Very-Large-Scale Multimedia Corpus, Mining and Retrieval (2010)Google Scholar
  9. 9.
    Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE TPAMI 33(1), 117–128 (2011)CrossRefGoogle Scholar
  10. 10.
    Lejsek, H., Ásmundsson, F.H., Jónsson, B.T., Amsaleg, L.: NV-Tree: An efficient disk-based index for approximate search in very large high-dimensional collections. IEEE Trans. Pattern Anal. Mach. Intell. 31, 869–883 (2009)CrossRefGoogle Scholar
  11. 11.
    Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2, 1–19 (2006)CrossRefGoogle Scholar
  12. 12.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2) (2004)Google Scholar
  13. 13.
    Paulevé, L., Jégou, H., Amsaleg, L.: Locality sensitive hashing: A comparison of hash function types and querying mechanisms. Pattern Recognition Letters 31(11), 1348–1358 (2010)CrossRefGoogle Scholar
  14. 14.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: Improving particular object retrieval in large scale image databases. In: Proc. CVPR (2008)Google Scholar
  15. 15.
    Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., San Francisco (2005)Google Scholar
  16. 16.
    Shaft, U., Ramakrishnan, R.: Theory of nearest neighbors indexability. ACM TODS 31(3), 814–838 (2006)CrossRefGoogle Scholar
  17. 17.
    Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: Proc. ICCV (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Gylfi Þór Gudmundsson
    • 1
  • Laurent Amsaleg
    • 1
    • 2
  • Björn Þór Jónsson
    • 3
  1. 1.INRIARennesFrance
  2. 2.CNRSRennesFrance
  3. 3.School of Computer ScienceReykjavík UniversityReykjavíkIceland

Personalised recommendations