Distributed and Parallel Databases

, Volume 30, Issue 3–4, pp 179–207 | Cite as

Combining CPU and GPU architectures for fast similarity search

  • Martin Kruliš
  • Tomáš Skopal
  • Jakub Lokoč
  • Christian Beecks
Article

Abstract

The Signature Quadratic Form Distance on feature signatures represents a flexible distance-based similarity model for effective content-based multimedia retrieval. Although metric indexing approaches are able to speed up query processing by two orders of magnitude, their applicability to large-scale multimedia databases containing billions of images is still a challenging issue. In this paper, we propose a parallel approach that balances the utilization of CPU and many-core GPUs for efficient similarity search with the Signature Quadratic Form Distance. In particular, we show how to process multiple distance computations and other parts of the search procedure in parallel, achieving maximal performance of the combined CPU/GPU system. The experimental evaluation demonstrates that our approach implemented on a common workstation with 2 GPU cards outperforms traditional parallel implementation on a high-end 48-core NUMA server in terms of efficiency almost by an order of magnitude. If we consider also the price of the high-end server that is ten times higher than that of the GPU workstation then, based on price/performance ratio, the GPU-based similarity search beats the CPU-based solution by almost two orders of magnitude. Although proposed for the SQFD, our approach of fast GPU-based similarity search is applicable for any distance function that is efficiently parallelizable in the SIMT execution model.

Keywords

Similarity search Database indexing Parallel computing GPU Pivot table Metric Ptolemaic Multimedia databases 

References

  1. 1.
    Beecks, C., Lokoč, J., Seidl, T., Skopal, T.: Indexing the signature quadratic form distance for efficient content-based multimedia retrieval. In: Proc. ACM Int. Conf. on Multimedia Retrieval, pp. 24:1–24:8 (2011) Google Scholar
  2. 2.
    Beecks, C., Seidl, T.: On stability of adaptive similarity measures for content-based image retrieval. In: MMM, pp. 346–357 (2012) Google Scholar
  3. 3.
    Beecks, C., Uysal, M.S., Seidl, T.: Signature quadratic form distances for content-based similarity. In: Proc. ACM Multimedia, pp. 697–700 (2009) Google Scholar
  4. 4.
    Beecks, C., Uysal, M.S., Seidl, T.: A comparative study of similarity measures for content-based multimedia retrieval. In: Proc. IEEE ICME, pp. 1552–1557 (2010) Google Scholar
  5. 5.
    Beecks, C., Uysal, M.S., Seidl, T.: Signature quadratic form distance. In: Proc. ACM CIVR, pp. 438–445 (2010) Google Scholar
  6. 6.
    Bolettieri, P., Esuli, A., Falchi, F., Lucchese, C., Perego, R., Piccioli, T., Rabitti, F.: CoPhIR: a test collection for content-based image retrieval. 0905.4627v2 (2009). http://cophir.isti.cnr.it
  7. 7.
    Bustos, B., Deussen, O., Hiller, S., Keim, D.: A graphics hardware accelerated algorithm for nearest neighbor search. In: Computational Science—ICCS 2006, pp. 196–199 (2006) CrossRefGoogle Scholar
  8. 8.
    Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001). doi:10.1145/502807.502808 CrossRefGoogle Scholar
  9. 9.
    Deselaers, T., Keysers, D., Ney, H.: Features for image retrieval: an experimental comparison. Inf. Retr. 11(2), 77–107 (2008). doi:10.1007/s10791-007-9039-3 CrossRefGoogle Scholar
  10. 10.
    Garcia, V., Debreuve, E., Barlaud, M.: Fast k nearest neighbor search using gpu. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPRW’08, pp. 1–6. IEEE, New York (2008) Google Scholar
  11. 11.
    Geusebroek, J.M., Burghouts, G.J., Smeulders, A.W.M.: The Amsterdam library of object images. Int. J. Comput. Vis. 61(1), 103–112 (2005) CrossRefGoogle Scholar
  12. 12.
    Hafner, J., Sawhney, H.S., Equitz, W., Flickner, M., Niblack, W.: Efficient color histogram indexing for quadratic form distance functions. IEEE Trans. Pattern Anal. Mach. Intell. 17, 729–736 (1995). doi:10.1109/34.391417 CrossRefGoogle Scholar
  13. 13.
    Hetland, M.L.: The basic principles of metric indexing. In: Coello, C.A.C., Dehuri, S., Ghosh, S. (eds.) Swarm Intelligence for Multi-objective Problems in Data Mining. Studies in Computational Intelligence, vol. 242. Springer, Berlin (2009) CrossRefGoogle Scholar
  14. 14.
    Hetland, M.L.: Ptolemaic indexing. arXiv:0911.4384 [cs.DS] (2009)
  15. 15.
    Hu, R., Rüger, S., Song, D., Liu, H., Huang, Z.: Dissimilarity measures for content-based image retrieval. In: Proc. IEEE International Conference on Multimedia & Expo, pp. 1365–1368 (2008). doi:10.1109/ICME.2008.4607697 Google Scholar
  16. 16.
    Huiskes, M.J., Lew, M.S.: The mir flickr retrieval evaluation. In: Proc. ACM MIR, pp. 39–43 (2008) Google Scholar
  17. 17.
    Leow, W.K., Li, R.: The analysis and applications of adaptive-binning color histograms. Comput. Vis. Image Underst. 94(1–3), 67–91 (2004). doi:10.1016/j.cviu.2003.10.010 CrossRefGoogle Scholar
  18. 18.
    Lieberman, M., Sankaranarayanan, J., Samet, H.: A fast similarity join algorithm using graphics processing units. In: IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 1111–1120. IEEE, New York (2008) CrossRefGoogle Scholar
  19. 19.
    Lokoč, J., Hetland, M., Skopal, T., Beecks, C.: Ptolemaic indexing of the signature quadratic form distance. In: Proceedings of the Fourth International Conference on Similarity Search and Applications, pp. 9–16. ACM, New York (2011) Google Scholar
  20. 20.
    Lokoč, J.: Cloud of points generator. SIRET Research Group (2010). http://siret.ms.mff.cuni.cz/projects/pointgenerator/
  21. 21.
    Mémoli, F., Sapiro, G.: Comparing point clouds. In: SGP’04: Proceedings of the 2004 Eurographics/ACM SIGGRAPH Symposium on Geometry Processing, pp. 32–40. ACM, New York (2004). doi:10.1145/1057432.1057436 CrossRefGoogle Scholar
  22. 22.
    Mico, M.L., Oncina, J., Vidal, E.: A new version of the nearest-neighbour approximating and eliminating search algorithm (aesa) with linear preprocessing time and memory requirements. Pattern Recognit. Lett. 15(1), 9–17 (1994). doi:10.1016/0167-8655(94)90095-7 CrossRefGoogle Scholar
  23. 23.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005). doi:10.1109/TPAMI.2005.188 CrossRefGoogle Scholar
  24. 24.
    Navarro, G.: Analyzing metric space indexes: what for? In: IEEE SISAP 2009, pp. 3–10 (2009) Google Scholar
  25. 25.
  26. 26.
    Pan, J., Manocha, D.: Fast gpu-based locality sensitive hashing for k-nearest neighbor computation. In: Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 211–220. ACM, New York (2011) Google Scholar
  27. 27.
    Puzicha, J., Buhmann, J., Rubner, Y., Tomasi, C.: Empirical evaluation of dissimilarity measures for color and texture. In: Proc. IEEE International Conference on Computer Vision, vol. 2, pp. 1165–1172 (1999) CrossRefGoogle Scholar
  28. 28.
    Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99–121 (2000). doi:10.1023/A:1026543900054 MATHCrossRefGoogle Scholar
  29. 29.
    Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Mateo (2006) MATHGoogle Scholar
  30. 30.
    Skopal, T., Bartoš, T., Lokoč, J.: On (not) indexing quadratic form distance by metric access methods. In: Proc. Extending Database Technology (EDBT). ACM, New York (2011) Google Scholar
  31. 31.
    Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search: The Metric Space Approach. Advances in Database Systems. Springer, New York (2005) Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Martin Kruliš
    • 1
  • Tomáš Skopal
    • 1
  • Jakub Lokoč
    • 1
  • Christian Beecks
    • 2
  1. 1.SIRET Research Group, Faculty of Mathematics and PhysicsCharles University in PraguePragueCzech Republic
  2. 2.Data Management and Data Exploration GroupRWTH Aachen UniversityAachenGermany

Personalised recommendations