Skip to main content
Log in

Employing GPU architectures for permutation-based indexing

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Permutation-based indexing is one of the most popular techniques for the approximate nearest-neighbor search problem in high-dimensional spaces. Due to the exponential increase of multimedia data, the time required to index this data has become a serious constraint. One of the possible steps towards faster index construction is utilization of massively parallel platforms such as the GPGPU architectures. In this paper, we have analyzed the computational costs of individual steps of the permutation-based index construction in a high-dimensional feature space and summarized our hybrid CPU-GPU solution. Our experience gained from this research may be utilized in other individual problems that require computing L p distances in high-dimensional spaces, parallel top-k selection, or partial sorting of multiple smaller sets. We also provide guidelines how to balance workload in hybrid CPU-GPU systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. Each architecture uses different abbreviation for the multiprocessors (SM in case of Fermi, SMX in Kepler, and SMM in Maxwell). In order to avoid ambiguity, we use SMP as an universal abbreviation which does not refer to any particular architecture.

  2. We are using kilo ‘k’ and mega ‘M’ unit prefixes (or numerical suffixes) in the computer science sense – i.e., representing 210 and 220 multipliers respectively.

  3. Let us note that the sorting-only approach to index construction is significantly slower than the presented approach for all our parameter configurations.

  4. Our prototype solution is currently restricted to x86-compatible architectures that use little-endian number representation.

  5. Although, Maxwell implements 96kB of memory per SMP, so an SMP can accommodate two thread blocks without limiting their shared memory consumption.

  6. The ballot instruction gathers one bit of information from each thread in a warp, assembles them into 32-bit word, and broadcasts them back to the whole warp.

References

  1. NVIDIA Maxwell GPU Architecture

  2. Alabi T, Blanchard JD, Gordon B, Steinbach R (2012) Fast k-selection algorithms for graphics processing units. Journal of Experimental Algorithmics (JEA) 17:4–2

    MathSciNet  MATH  Google Scholar 

  3. Amato G, Gennaro C, Savino P (2012) Mi-file: using inverted files for scalable approximate similarity search Multimedia Tools and Applications

  4. Amato G, Savino P (2008) Approximate similarity search in metric spaces using inverted files Proceedings of the 3rd international conference on scalable information systems. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), p 28

  5. Batcher KE (1968) Sorting networks and their applications. In: Proceedings of the April 30–May 2, 1968, Spring Joint Computer Conference, AFIPS ’68 (Spring). ACM, New York, pp 307– 314

  6. Batcher KE (1968) Sorting networks and their applications Proceedings of the april 30–may 2, 1968, spring joint computer conference. ACM, pp 307–314

  7. Chang D, Jones N, Li M, Ouyang D, Ragade R (2008) Compute pairwise euclidean distances of data points with gpus. In: Proceedings of the IASTED International Symposium on Computational Biology and Bioinformatics (CBB), pp 278–283

  8. Chávez E, Figueroa K, Navarro G. (2008) Effective proximity retrieval by ordering permutations. IEEE Trans Pattern Anal Mach Intell 30(9):1647–1658

    Article  Google Scholar 

  9. Ciaccia P, Patella M (2000) Pac nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces. In: Proceedings of the 16th International Conference on Data Engineering (ICDE 2000). IEEE Computer Society, San Diego, pp 244–255

  10. Dagum L, Enon R (1998) Openmp: an industry standard api for shared-memory programming. Computational Science & Engineering, IEEE 5(1):46–55

    Article  Google Scholar 

  11. Esuli A (2009) Mipai: Using the pp-index to build an efficient and scalable similarity search system. In: Proceedings of the 2009 2nd International Workshop on Similarity Search and Applications. IEEE Computer Society, Washington, pp 146–148

  12. Esuli A (2009) PP-Index: Using permutation prefixes for efficient and scalable approximate similarity search. In: Proceedings of LSDS-IR, 2009

  13. Indyk P, Motwani R (1998) Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing, STOC ’98. ACM, New York, pp 604–613

  14. Jagadish HV, Mendelzon AO, Milo T (1995) Similarity-based queries Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, pp 36–45

    Chapter  Google Scholar 

  15. Jan B, Montrucchio B, Ragusa C, Khan F, Khan O (2012) Fast parallel sorting algorithms on gpus 3

  16. Knuth DE (2003) Sorting and searching. Addison-Wesley

  17. Kruliš M, Falt Z, Bednárek D, Yaghob J (2012) Task scheduling in hybrid CPU-GPU systems. Informacné Technológie-Aplikácie a Teória:17

  18. Krulis M, Osipyan H, Marchand-Maillet S (2015) Optimizing Sorting and Top-k Selection Steps in Permutation Based Indexing on GPUs 4Th ADBIS workshop on GPUs in databases, GID 2015, poitiers, France, September 08, 2015, pages 305–317

    Google Scholar 

  19. Krulis M, Osipyan H, Marchand-Maillet S (2015) Permutation based indexing for high dimensional data on GPU architectures 13Th international workshop on content-based multimedia indexing, CBMI 2015, prague, Czech Republic, June 10–12, 2015, pages 1–6

    Google Scholar 

  20. Kushilevitz E, Ostrovsky R, Rabani Y (1998) Efficient search for approximate nearest neighbor in high dimensional spaces. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing, STOC ’98. ACM, New York, pp 614–623

  21. Li Q, Kecman V, Salman R (2010) A chunking method for euclidean distance matrix calculation on large dataset using multi-gpu. In: Draghici S, Khoshgoftaar TM, Palade V, Pedrycz W, Wani MA, Zhu X (eds) ICMLA. IEEE Computer Society, pp 208–213

  22. Lopresti M, Miranda N, Piccoli F, Reyes N (2013) Solving multiple queries through a permutation index in gpu. In: Journal Computacion y Sistemas, San Luis, pp 341–356

  23. Mohamed H, Marchand-Maillet S (2012) Parallel approaches to permutation-based indexing using inverted files. In: SISAP’12, pp 148–161

  24. Mohamed H, Marchand-Maillet S (2013) Quantized ranking for permutation-based indexing. In: Brisaboa N, Pedreira O, Zezula P (eds) Similarity Search and Applications, volume 8199 of Lecture Notes in Computer Science. Springer, Berlin, pp 103–114

    Google Scholar 

  25. Mohammed H, Osipyan H, Stephane M-M (2014) Multi-core (cpu and gpu) for permutation-based indexing 7Th international conference on similarity search and applications. Los Cabos, Mexico, pp 277–288

    Google Scholar 

  26. Monroe L, Wendelberger J, Michalak S (2011) Randomized selection on the gpu. In: Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics, HPG ’11. ACM, New York, pp 89–98

  27. Novak D, Kyselak M, Zezula P (2010) On locality-sensitive indexing in generic metric spaces. In: Proceedings of the 3rd International Conference on Similarity Search and Applications, SISAP ’10. ACM, New York, pp 59–66

  28. NVIDIA Kepler GPU Architecture

  29. Patella M, Ciaccia P (2009) Approximate similarity search: A multi-faceted problem. J Discrete Algorithms 7(1):36–48

    Article  MathSciNet  MATH  Google Scholar 

  30. Peters H, Schulz-Hildebrandt O, Luttenberger N (2010) Fast in-place sorting with cuda based on bitonic sort. In: Parallel Processing and Applied Mathematics. Springer, pp 403–410

  31. Pheatt C (2008) Intel® threading building blocks. Journal of Computing Sciences in Colleges 23(4):298–298

  32. Samet H (2005) Foundations of multidimensional and metric data structures (the morgan kaufmann series in computer graphics and geometric modeling). Morgan Kaufmann Publishers Inc., San Francisco

    Google Scholar 

  33. Sanders J, Kandrot E (2010) CUDA By Example: An Introduction to General-Purpose GPU Programming, 1st edn. Addison-Wesley Professional

  34. Satish N, Harris M, Garland M (2009) Designing efficient sorting algorithms for manycore gpus. In: Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing, IPDPS ’09. IEEE Computer Society, Washington, pp 1–10

  35. Tellez ES, Chvez E, Camarena-Ibarrola A (2009) A brief index for proximity searching. In: Bayro-Corrochano E, Eklundh J-O (eds) CIARP, volume 5856 of Lecture Notes in Computer Science. Springer, pp 529–536

  36. Ye X, Fan D, Lin W, Yuan N, Ienne P (2010) High performance comparison-based sorting algorithm on many-core gpus. In: IPDPS’10, pp 1–10

Download references

Acknowledgments

This paper was supported by Czech Science Foundation (GAČR) project no. P103-14-14292P and partly by the Swiss National Foundation (SNF) under interdisciplinary project MAAYA (grant number 144238).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Kruliš.

Additional information

This paper is an extended version of two previously published conference papers by Kruliš et al., which addressed two the most important aspects of PBI construction – the distance computations [19] and GPU sorting and top-k selection problem [18]. In this work, we provide the complete solution described in much more detail. Furthermore, we have explored several other approaches to the studied problems and extended our experiments to cover much wider range of parameters.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kruliš, M., Osipyan, H. & Marchand-Maillet, S. Employing GPU architectures for permutation-based indexing. Multimed Tools Appl 76, 11859–11887 (2017). https://doi.org/10.1007/s11042-016-3677-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3677-7

Keywords

Navigation