Abstract
Permutation-based indexing is one of the most popular techniques for the approximate nearest-neighbor search problem in high-dimensional spaces. Due to the exponential increase of multimedia data, the time required to index this data has become a serious constraint. One of the possible steps towards faster index construction is utilization of massively parallel platforms such as the GPGPU architectures. In this paper, we have analyzed the computational costs of individual steps of the permutation-based index construction in a high-dimensional feature space and summarized our hybrid CPU-GPU solution. Our experience gained from this research may be utilized in other individual problems that require computing L p distances in high-dimensional spaces, parallel top-k selection, or partial sorting of multiple smaller sets. We also provide guidelines how to balance workload in hybrid CPU-GPU systems.
Similar content being viewed by others
Notes
Each architecture uses different abbreviation for the multiprocessors (SM in case of Fermi, SMX in Kepler, and SMM in Maxwell). In order to avoid ambiguity, we use SMP as an universal abbreviation which does not refer to any particular architecture.
We are using kilo ‘k’ and mega ‘M’ unit prefixes (or numerical suffixes) in the computer science sense – i.e., representing 210 and 220 multipliers respectively.
Let us note that the sorting-only approach to index construction is significantly slower than the presented approach for all our parameter configurations.
Our prototype solution is currently restricted to x86-compatible architectures that use little-endian number representation.
Although, Maxwell implements 96kB of memory per SMP, so an SMP can accommodate two thread blocks without limiting their shared memory consumption.
The ballot instruction gathers one bit of information from each thread in a warp, assembles them into 32-bit word, and broadcasts them back to the whole warp.
References
NVIDIA Maxwell GPU Architecture
Alabi T, Blanchard JD, Gordon B, Steinbach R (2012) Fast k-selection algorithms for graphics processing units. Journal of Experimental Algorithmics (JEA) 17:4–2
Amato G, Gennaro C, Savino P (2012) Mi-file: using inverted files for scalable approximate similarity search Multimedia Tools and Applications
Amato G, Savino P (2008) Approximate similarity search in metric spaces using inverted files Proceedings of the 3rd international conference on scalable information systems. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), p 28
Batcher KE (1968) Sorting networks and their applications. In: Proceedings of the April 30–May 2, 1968, Spring Joint Computer Conference, AFIPS ’68 (Spring). ACM, New York, pp 307– 314
Batcher KE (1968) Sorting networks and their applications Proceedings of the april 30–may 2, 1968, spring joint computer conference. ACM, pp 307–314
Chang D, Jones N, Li M, Ouyang D, Ragade R (2008) Compute pairwise euclidean distances of data points with gpus. In: Proceedings of the IASTED International Symposium on Computational Biology and Bioinformatics (CBB), pp 278–283
Chávez E, Figueroa K, Navarro G. (2008) Effective proximity retrieval by ordering permutations. IEEE Trans Pattern Anal Mach Intell 30(9):1647–1658
Ciaccia P, Patella M (2000) Pac nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces. In: Proceedings of the 16th International Conference on Data Engineering (ICDE 2000). IEEE Computer Society, San Diego, pp 244–255
Dagum L, Enon R (1998) Openmp: an industry standard api for shared-memory programming. Computational Science & Engineering, IEEE 5(1):46–55
Esuli A (2009) Mipai: Using the pp-index to build an efficient and scalable similarity search system. In: Proceedings of the 2009 2nd International Workshop on Similarity Search and Applications. IEEE Computer Society, Washington, pp 146–148
Esuli A (2009) PP-Index: Using permutation prefixes for efficient and scalable approximate similarity search. In: Proceedings of LSDS-IR, 2009
Indyk P, Motwani R (1998) Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing, STOC ’98. ACM, New York, pp 604–613
Jagadish HV, Mendelzon AO, Milo T (1995) Similarity-based queries Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, pp 36–45
Jan B, Montrucchio B, Ragusa C, Khan F, Khan O (2012) Fast parallel sorting algorithms on gpus 3
Knuth DE (2003) Sorting and searching. Addison-Wesley
Kruliš M, Falt Z, Bednárek D, Yaghob J (2012) Task scheduling in hybrid CPU-GPU systems. Informacné Technológie-Aplikácie a Teória:17
Krulis M, Osipyan H, Marchand-Maillet S (2015) Optimizing Sorting and Top-k Selection Steps in Permutation Based Indexing on GPUs 4Th ADBIS workshop on GPUs in databases, GID 2015, poitiers, France, September 08, 2015, pages 305–317
Krulis M, Osipyan H, Marchand-Maillet S (2015) Permutation based indexing for high dimensional data on GPU architectures 13Th international workshop on content-based multimedia indexing, CBMI 2015, prague, Czech Republic, June 10–12, 2015, pages 1–6
Kushilevitz E, Ostrovsky R, Rabani Y (1998) Efficient search for approximate nearest neighbor in high dimensional spaces. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing, STOC ’98. ACM, New York, pp 614–623
Li Q, Kecman V, Salman R (2010) A chunking method for euclidean distance matrix calculation on large dataset using multi-gpu. In: Draghici S, Khoshgoftaar TM, Palade V, Pedrycz W, Wani MA, Zhu X (eds) ICMLA. IEEE Computer Society, pp 208–213
Lopresti M, Miranda N, Piccoli F, Reyes N (2013) Solving multiple queries through a permutation index in gpu. In: Journal Computacion y Sistemas, San Luis, pp 341–356
Mohamed H, Marchand-Maillet S (2012) Parallel approaches to permutation-based indexing using inverted files. In: SISAP’12, pp 148–161
Mohamed H, Marchand-Maillet S (2013) Quantized ranking for permutation-based indexing. In: Brisaboa N, Pedreira O, Zezula P (eds) Similarity Search and Applications, volume 8199 of Lecture Notes in Computer Science. Springer, Berlin, pp 103–114
Mohammed H, Osipyan H, Stephane M-M (2014) Multi-core (cpu and gpu) for permutation-based indexing 7Th international conference on similarity search and applications. Los Cabos, Mexico, pp 277–288
Monroe L, Wendelberger J, Michalak S (2011) Randomized selection on the gpu. In: Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics, HPG ’11. ACM, New York, pp 89–98
Novak D, Kyselak M, Zezula P (2010) On locality-sensitive indexing in generic metric spaces. In: Proceedings of the 3rd International Conference on Similarity Search and Applications, SISAP ’10. ACM, New York, pp 59–66
NVIDIA Kepler GPU Architecture
Patella M, Ciaccia P (2009) Approximate similarity search: A multi-faceted problem. J Discrete Algorithms 7(1):36–48
Peters H, Schulz-Hildebrandt O, Luttenberger N (2010) Fast in-place sorting with cuda based on bitonic sort. In: Parallel Processing and Applied Mathematics. Springer, pp 403–410
Pheatt C (2008) Intel® threading building blocks. Journal of Computing Sciences in Colleges 23(4):298–298
Samet H (2005) Foundations of multidimensional and metric data structures (the morgan kaufmann series in computer graphics and geometric modeling). Morgan Kaufmann Publishers Inc., San Francisco
Sanders J, Kandrot E (2010) CUDA By Example: An Introduction to General-Purpose GPU Programming, 1st edn. Addison-Wesley Professional
Satish N, Harris M, Garland M (2009) Designing efficient sorting algorithms for manycore gpus. In: Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing, IPDPS ’09. IEEE Computer Society, Washington, pp 1–10
Tellez ES, Chvez E, Camarena-Ibarrola A (2009) A brief index for proximity searching. In: Bayro-Corrochano E, Eklundh J-O (eds) CIARP, volume 5856 of Lecture Notes in Computer Science. Springer, pp 529–536
Ye X, Fan D, Lin W, Yuan N, Ienne P (2010) High performance comparison-based sorting algorithm on many-core gpus. In: IPDPS’10, pp 1–10
Acknowledgments
This paper was supported by Czech Science Foundation (GAČR) project no. P103-14-14292P and partly by the Swiss National Foundation (SNF) under interdisciplinary project MAAYA (grant number 144238).
Author information
Authors and Affiliations
Corresponding author
Additional information
This paper is an extended version of two previously published conference papers by Kruliš et al., which addressed two the most important aspects of PBI construction – the distance computations [19] and GPU sorting and top-k selection problem [18]. In this work, we provide the complete solution described in much more detail. Furthermore, we have explored several other approaches to the studied problems and extended our experiments to cover much wider range of parameters.
Rights and permissions
About this article
Cite this article
Kruliš, M., Osipyan, H. & Marchand-Maillet, S. Employing GPU architectures for permutation-based indexing. Multimed Tools Appl 76, 11859–11887 (2017). https://doi.org/10.1007/s11042-016-3677-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3677-7