Skip to main content

Impact of Storage Technology on the Efficiency of Cluster-Based High-Dimensional Index Creation

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNISA,volume 7240)

Abstract

The scale of multimedia data collections is expanding at a very fast rate. In order to cope with this growth, the high-dimensional indexing methods used for content-based multimedia retrieval must adapt gracefully to secondary storage. Recent progress in storage technology, however, means that algorithm designers must now cope with a spectrum of secondary storage solutions, ranging from traditional magnetic hard drives to state-of-the-art solid state disks. We study the impact of storage technology on a simple, prototypical high-dimensional indexing method for large scale query processing. We show that while the algorithm implementation deeply impacts the performance of the indexing method, the choice of underlying storage technology is equally important.

Keywords

  • Storage Technology
  • Magnetic Disk
  • Average Cluster Size
  • Secondary Storage
  • Query Vector

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51, 117–122 (2008)

    CrossRef  Google Scholar 

  2. Athanassoulis, M., Ailamaki, A., Chen, S., Gibbons, P.B., Stoica, R.: Flash in a dbms: Where and how? IEEE Data Eng. Bull. 33(4), 28–34 (2010)

    Google Scholar 

  3. Bonnet, P., Bouganim, L.: Flash device support for database management. In: CIDR, pp. 1–8 (2011), www.crdrdb.org

  4. Bouganim, L., Jónsson, B.T., Bonnet, P.: uFLIP: Understanding flash IO patterns. In: Proc. CIDR (2009)

    Google Scholar 

  5. Casey, M., Veltkamp, R., Goto, M., Leman, M., Rhodes, C., Slaney, M.: Content-based music information retrieval: Current directions and future challenges. Proceedings of the IEEE 96(4), 668–696 (2008)

    CrossRef  Google Scholar 

  6. Chierichetti, F., Panconesi, A., Raghavan, P., Sozio, M., Tiberi, A., Upfal, E.: Finding near neighbors through cluster pruning. In: Proc. PODS (2007)

    Google Scholar 

  7. Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40, 5:1–5:60 (2008)

    CrossRef  Google Scholar 

  8. Gudmundsson, G., Jónsson, B.T., Amsaleg, L.: A large-scale performance study of cluster-based high-dimensional indexing. In: Proc. ACMMM–Workshop on Very-Large-Scale Multimedia Corpus, Mining and Retrieval (2010)

    Google Scholar 

  9. Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE TPAMI 33(1), 117–128 (2011)

    CrossRef  Google Scholar 

  10. Lejsek, H., Ásmundsson, F.H., Jónsson, B.T., Amsaleg, L.: NV-Tree: An efficient disk-based index for approximate search in very large high-dimensional collections. IEEE Trans. Pattern Anal. Mach. Intell. 31, 869–883 (2009)

    CrossRef  Google Scholar 

  11. Lew, M.S., Sebe, N., Djeraba, C., Jain, R.: Content-based multimedia information retrieval: State of the art and challenges. ACM Trans. Multimedia Comput. Commun. Appl. 2, 1–19 (2006)

    CrossRef  Google Scholar 

  12. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2) (2004)

    Google Scholar 

  13. Paulevé, L., Jégou, H., Amsaleg, L.: Locality sensitive hashing: A comparison of hash function types and querying mechanisms. Pattern Recognition Letters 31(11), 1348–1358 (2010)

    CrossRef  Google Scholar 

  14. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: Improving particular object retrieval in large scale image databases. In: Proc. CVPR (2008)

    Google Scholar 

  15. Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers Inc., San Francisco (2005)

    Google Scholar 

  16. Shaft, U., Ramakrishnan, R.: Theory of nearest neighbors indexability. ACM TODS 31(3), 814–838 (2006)

    CrossRef  Google Scholar 

  17. Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: Proc. ICCV (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gudmundsson, G.Þ., Amsaleg, L., Jónsson, B.Þ. (2012). Impact of Storage Technology on the Efficiency of Cluster-Based High-Dimensional Index Creation. In: Yu, H., Yu, G., Hsu, W., Moon, YS., Unland, R., Yoo, J. (eds) Database Systems for Advanced Applications. DASFAA 2012. Lecture Notes in Computer Science, vol 7240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29023-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29023-7_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29022-0

  • Online ISBN: 978-3-642-29023-7

  • eBook Packages: Computer ScienceComputer Science (R0)