Approximate similarity search for online multimedia services on distributed CPU–GPU platforms


Similarity search in high-dimensional spaces is a pivotal operation for several database applications, including online content-based multimedia services. With the increasing popularity of multimedia applications, these services are facing new challenges regarding (1) the very large and growing volumes of data to be indexed/searched and (2) the necessity of reducing the response times as observed by end-users. In addition, the nature of the interactions between users and online services creates fluctuating query request rates throughout execution, which requires a similarity search engine to adapt to better use the computation platform and minimize response times. In this work, we address these challenges with Hypercurves, a flexible framework for answering approximate k-nearest neighbor (kNN) queries for very large multimedia databases. Hypercurves executes in hybrid CPU–GPU environments and is able to attain massive query-processing rates through the cooperative use of these devices. Hypercurves also changes its CPU–GPU task partitioning dynamically according to the observed load, aiming for optimal response times. In our empirical evaluation, dynamic task partitioning reduced query response times by approximately 50 % compared to the best static task partition. Due to a probabilistic proof of equivalence to the sequential kNN algorithm, the CPU–GPU execution of Hypercurves in distributed (multi-node) environments can be aggressively optimized, attaining superlinear scalability while still guaranteeing, with high probability, results at least as good as those from the sequential algorithm.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15


  1. 1.

  2. 2.


  1. 1.

    The message passing interface (MPI).

  2. 2.

    Adan, I., Resing, J.: Queueing theory. Eindhoven University of Technology, Department of Mathematics and Computing Science, Eindhoven, The Netherlands, Lecture notes (2001)

  3. 3.

    Akune, F., Valle, E., Torres, R.: MONORAIL: a disk-friendly index for huge descriptor databases. In: 20th international conference on pattern recognition (ICPR) (2010)

  4. 4.

    Arefin, A.S., Riveros, C., Berretta, R., Moscato, P.: GPU-FS-kNN: a software tool for fast and scalable kNN computation using GPUs. PLoS ONE 7(8), e44000 (2012)

    Article  Google Scholar 

  5. 5.

    Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. In: International Euro-Par conference on parallel processing, pp. 863–874 (2009)

  6. 6.

    Beecks, C., Seidl, T.: On stability of adaptive similarity measures for content-based image retrieval. In: Schoeffmann, K., Mérialdo, B., Hauptmann, A.G., Ngo, C.W., Andreopoulos, Y., Breiteneder, C. (eds) MMM, Lecture Notes in Computer Science, vol. 7131. Springer (2012)

  7. 7.

    Beecks, C., Uysal, M.S., Seidl, T.: Signature quadratic form distances for content-based similarity. In: Proceedings of the 17th ACM international conference on multimedia, MM ’09, pp. 697–700. ACM (2009)

  8. 8.

    Beecks, C., Uysal, M.S., Seidl, T.: Signature quadratic form distance. In: Proceedings of the ACM international conference on image and video retrieval, CIVR ’10, pp. 438–445. ACM (2010)

  9. 9.

    Bell, N., Hoberock, J.: Thrust: a productivity-oriented library for CUDA. In: Mei, W., Hwu, W. (ed.) GPU Gems. Jade Edition (2011)

  10. 10.

    Beynon, M., Ferreira, R., Kurc, T.M., Sussman, A., Saltz, J.H.: DataCutter: Middleware for filtering very large scientific datasets on archival storage systems. In: IEEE symposium on mass storage systems, pp. 119–134 (2000)

  11. 11.

    Bhatti, N.T., Hiltunen, M.A., Schlichting, R.D., Chiu, W.: Coyote: a system for constructing fine-grain configurable communication service. ACM Trans. Comput. Syst. 16(4), 321–366 (1998)

    Article  Google Scholar 

  12. 12.

    Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)

    Article  Google Scholar 

  13. 13.

    Bosilca, G., Bouteiller, A., Herault, T., Lemarinier, P., Saengpatsa, N., Tomov, S., Dongarra, J.: Performance portability of a GPU enabled factorization with the DAGuE framework. In: IEEE international conference on cluster computing (CLUSTER) (2011)

  14. 14.

    Boureau, Y.L., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition, pp. 2559–2566. IEEE conference on computer vision and pattern recognition (2010)

  15. 15.

    Butz, A.R.: Alternative algorithm for Hilbert’s space-filling curve. IEEE Trans. Comput. 100(4), 424–426 (1971)

    Article  Google Scholar 

  16. 16.

    Castelli, V.: Multidimensional indexing structures for content-based retrieval, pp. 373–433. Wiley, New York (2002)

    Google Scholar 

  17. 17.

    Chandrasekhar, V., Sharifi, M., Ross, D.A.: Survey and evaluation of audio fingerprinting schemes for mobile query-by-example applications. In: Klapuri, A., Leider, C. (eds.) ISMIR, pp. 801–806. University of Miami, Miami (2011)

    Google Scholar 

  18. 18.

    Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)

    Article  Google Scholar 

  19. 19.

    Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the twentieth annual symposium on computational geometry, SCG ’04. ACM (2004)

  20. 20.

    Deisher, M., Smelyanskiy, M., Nickerson, B., Lee, V.W., Chuvelev, M., Dubey, P.: Designing and dynamically load balancing hybrid LU for multi/many-core. Comput Sci Res Dev 26(3–4), 211–220 (2011)

    Article  Google Scholar 

  21. 21.

    Du Mouza, C., Litwin, W., Rigaux, P.: Large-scale indexing of spatial data in distributed repositories: the SD-Rtree. VLDB J. 18, 933–958 (2009)

    Article  Google Scholar 

  22. 22.

    Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, SIGMOD ’03. ACM (2003)

  23. 23.

    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS ’01, pp. 102–113. ACM (2001)

  24. 24.

    Faloutsos, C.: Gray codes for partial match and range queries. IEEE Trans. Softw. Eng. 14, 1381–1393 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  25. 25.

    Faloutsos, C.: Multimedia Indexing, pp. 435–464. Wiley, New York (2002)

    Google Scholar 

  26. 26.

    Faloutsos, C., Roseman, S.: Fractals for secondary key retrieval. In: Proceedings of the Eighth ACM SIGACT-SIGMOD-SIGART, PODS ’89, pp. 247–252. ACM (1989)

  27. 27.

    Garcia, V., Debreuve, E., Barlaud, M.: Fast k nearest neighbor search using GPU. In: CVPR workshop on computer vision on GPU (CVGPU). Anchorage, Alaska, USA (2008)

  28. 28.

    Harris, M., Sengupta, S., Owens, J.D.: Parallel Prefix Sum (Scan) with CUDA. In: Nguyen, H. (ed.) GPU Gems 3, chap. 39, pp. 851–876. Addison Wesley, Reading (2007)

    Google Scholar 

  29. 29.

    He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T.: Mars: A mapreduce framework on graphics processors. In: Parallel architectures and compilation techniques (2008)

  30. 30.

    Hua, G., Fu, Y., Turk, M., Pollefeys, M., Zhang, Z.: Introduction to the special issue on mobile vision. Int. J. Comput. Vis. 96, 277–279 (2012)

    Article  Google Scholar 

  31. 31.

    Huo, X., Ravi, V., Agrawal, G.: Porting irregular reductions on heterogeneous CPU–GPU configurations. In: 18th international conference on high performance computing (HiPC) (2011)

  32. 32.

    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp. 604–613 (1998)

  33. 33.

    Kato, K., Hosino, T.: Solving k-Nearest neighbor problem on multiple graphics processors. In: Proceedings of the 2010 10th IEEE/ACM international conference on cluster, cloud and grid computing, CCGRID ’10 (2010)

  34. 34.

    Krulis̆, M., Skopal, T., Lokoc̆, J., Beecks, C.: Combining CPU and GPU architectures for fast similarity search. Distrib. Parallel Databases 30, 179–207 (2012)

    Article  Google Scholar 

  35. 35.

    Kuang, Q., Zhao, L.: A practical GPU based kNN algorithm. In: International symposium on computer science and computational technology (ISCSCT), pp. 151–155 (2009)

  36. 36.

    Liao, S., Lopez, M.A., Leutenegger, S.T.: High dimensional similarity search with space filling curves. In: Proceedings of the 17th international conference on data, engineering, pp. 615–622 (2001)

  37. 37.

    Linderman, M.D., Collins, J.D., Wang, H., Meng, T.H.: Merge: a programming model for heterogeneous multi-core systems. SIGPLAN Not. 43(3), 287–296 (2008)

    Article  Google Scholar 

  38. 38.

    Liu, Y., Zhang, D., Lu, G., Ma, W.Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40(1), 262–282 (2007)

    Article  MATH  Google Scholar 

  39. 39.

    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)

    Article  Google Scholar 

  40. 40.

    Luk, C.K., Hong, S., Kim, H.: Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: 42nd international symposium on microarchitecture (MICRO) (2009)

  41. 41.

    Mainar-Ruiz, G., Perez-Cortes, J.C.: Approximate nearest neighbor search using a single space-filling curve and multiple representations of the data points. In: Proceedings of the 18th international conference on pattern recognition, pp. 502–505 (2006)

  42. 42.

    Megiddo, N., Shaft, U.: Efficient nearest neighbor indexing based on a collection of space filling curves. Technical Report IBM Research Report RJ 10093 (91909), IBM Almaden Research Center, San Jose California (1997)

  43. 43.

    Menascé, D., Almeida, V.: Capacity planning for web services: metrics, models and methods. Prentice Hall, Englewood (2002)

    Google Scholar 

  44. 44.

    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal Mach Intel 27, 1615–1630 (2005)

    Article  Google Scholar 

  45. 45.

    Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing. Technical Report, IBM Ltd., Ottawa, Ontario, Canada (1966)

  46. 46.

    Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: In VISAPP international conference on computer vision theory and applications, pp. 331–340 (2009)

  47. 47.

    nVidia corporation: CUDA CUBLAS library (2010).

  48. 48.

    O’Malley, S.W., Peterson, L.L.: A dynamic network architecture. ACM Trans. Comput. Syst. 10(2), 110–113 (1992)

    Article  Google Scholar 

  49. 49.

    Pan, J., Lauterbach, C., Manocha, D.: Efficient nearest-neighbor computation for GPU-based motion planning. In: 2010 IEEE/RSJ international conference on intelligent robots and systems (IROS), p. 2243–2248. IEEE (2010)

  50. 50.

    Pan, J., Manocha, D.: Fast GPU-based locality sensitive hashing for k-nearest neighbor computation. In: 19th ACM SIGSPATIAL international conference on advances in geographic information systems, GIS ’11. ACM (2011)

  51. 51.

    Pang, H., Ding, X., Zheng, B.: Efficient processing of exact top-k queries over disk-resident sorted lists. VLDB J. 19, 437–456 (2010)

    Article  Google Scholar 

  52. 52.

    Penatti, O.A.B., Valle, E., Torres, RdS: Comparative study of global color and texture descriptors for web image retrieval. J. Vis. Comun. Image Rep. 23(2), 359–380 (2012)

    Article  Google Scholar 

  53. 53.

    Ravi, V., Ma, W., Chiu, D., Agrawal, G.: Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In: Proceedings of the 24th ACM international conference on supercomputing, pp. 137–146. ACM (2010)

  54. 54.

    Sagan, H.: Space-filling curves. Springer, New York (1994)

    Google Scholar 

  55. 55.

    Samet, H.: Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling). Morgan Kaufmann Publishers Inc, San Francisco (2005)

    Google Scholar 

  56. 56.

    Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: IEEE international parallel and distributed processing symposium (IPDPS) (2009)

  57. 57.

    Shakhnarovich, G., Darrell, T., Indyk, P.: Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing). The MIT Press, Cambridge (2006)

    Google Scholar 

  58. 58.

    Shepherd, J., Zhu, X., Megiddo, N.: A fast indexing method for multidimensional nearest neighbor search. In: SPIE conference on storage and retrieval for image and video databases VII, pp. 350–355 (1999)

  59. 59.

    Sismanis, N., Pitsianis, N., Sun, X.: Parallel search of k-nearest neighbors with synchronous operations (2012)

  60. 60.

    Smeulders, A., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22(12), 1349–1380 (2000)

    Article  Google Scholar 

  61. 61.

    Stone, Z., Zickler, T., Darrell, T.: Autotagging facebook: social network context improves photo annotation. In: IEEE computer vision and pattern recognition workshops (2008)

  62. 62.

    Sun, L., Stoller, C., Newhall, T.: Hybrid MPI and GPU approach to efficiently solving large kNN problems. Tera Grid Poster. URL (2010)

  63. 63.

    Sunderam, V.S.: PVM: a framework for parallel distributed computing. Concurr. Pract. Exp. 2(4), 315–340 (1990)

    Article  Google Scholar 

  64. 64.

    Teodoro, G., Fireman, D., Guedes, D., Jr., W.M., Ferreira, R.: Achieving multi-level parallelism in filter-labeled stream programming model. In: The 37th international conference on parallel processing (ICPP) (2008)

  65. 65.

    Teodoro, G., Hartley, T.D.R., Catalyurek, U., Ferreira, R.: Run-time optimizations for replicated dataflows on heterogeneous environments. In: Proceedings of the 19th ACM international symposium on high performance distributed computing (HPDC) (2010)

  66. 66.

    Teodoro, G., Kurç, T.M., Pan, T., Cooper, L.A.D., Kong, J., Widener, P.M., Saltz, J.H.: Accelerating large scale image analyses on parallel, CPU-GPU equipped systems. In: IPDPS, pp. 1093–1104 (2012)

  67. 67.

    Teodoro, G., Pan, T., Kurc, T.M., Kong, J., Cooper, L.A., Podhorszki, N., Klasky, S., Saltz, J.H.: High-throughput analysis of large microscopy image datasets on CPU-GPU cluster platforms. In: IPDPS ’13 (2013)

  68. 68.

    Teodoro, G., Sachetto, R., Sertel, O., Gurcan, M., Jr., W.M., Catalyurek, U., Ferreira, R.: Coordinating the use of GPU and CPU for improving performance of compute intensive applications. In: IEEE cluster (2009)

  69. 69.

    Teodoro, G., Valle, E., Mariano, N., Torres, R., Meira Jr., W.: Adaptive parallel approximate similarity search for responsive multimedia retrieval. In: Proceedings of the 20th ACM international conference on information and knowledge management, CIKM ’11. ACM (2011)

  70. 70.

    Tuytelaars, T., Mikolajczyk, K.: Local invariant feature detectors: a survey. Found. Trends. Comput. Graph. Vis. 3, 177–280 (2008)

    Google Scholar 

  71. 71.

    Valle, E., Cord, M., Philipp-Foliguet, S.: Fast identification of visual documents using local descriptors. In: Proceeding of the eighth ACM symposium on document engineering, DocEng ’08. ACM (2008)

  72. 72.

    Valle, E., Cord, M., Philipp-Foliguet, S.: High-dimensional descriptor indexing for large multimedia databases. In: Proceeding of the 17th ACM conference on information and knowledge management, CIKM ’08. ACM (2008)

  73. 73.

    Valle, E., Cord, M., Phillip-Folliguet, S., Gorisse, D.: Indexing personal image collections: a flexible, scalable solution. IEEE Trans. Consum. Elect. 56, 1167–1175 (2010)

    Article  Google Scholar 

  74. 74.

    Vetter, J.S., Glassbrook, R., Dongarra, J., Schwan, K., Loftis, B., McNally, S., Meredith, J., Roth, P., Spafford, K., Yalamanchili, S.: Keeneland: bringing heterogeneous GPU computing to the computational science community. Comput. Sci. Eng. 13(5), 90–95 (2011)

    Article  Google Scholar 

  75. 75.

    Welsh, M., Culler, D., Brewer, E.: SEDA: an architecture for well-conditioned, scalable internet services. SIGOPS Oper. Syst. Rev. 35(5) (2001)

  76. 76.

    Winder, S.A.J., Brown, M.: Learning local image descriptors. In: CVPR (2007)

  77. 77.

    Yiu, M.L., Mamoulis, N.: Multi-dimensional top-k dominating queries. VLDB J. 18, 695–718 (2009)

    Article  Google Scholar 

  78. 78.

    Yu, H., Rauchwerger, L.: Adaptive reduction parallelization techniques. In: Proceedings of the 14th international conference on supercomputing, ICS ’00 (2000)

  79. 79.

    Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity search: the metric space approach, 1st edn. Springer Publishing Company, Springer (2010)

    Google Scholar 

Download references


We would like to express our gratitude to the reviewers for their valuable comments, which helped us to improve our work both in terms of content and presentation. E. Valle thanks CNPq and FAPESP for the financial support of this work. R. Torres thanks CAPES, CNPq (grants 306580/2012-8, 484254/2012-0), and FAPESP for the financial support. W. Meira Jr. thanks CNPq, FAPEMIG and InWeb for financial support of this work. E. Valle and G. Teodoro thank the CENAPAD/UNICAMP for making available the computational resources required by the expensive experiments of this work. This research also used resources of the Keeneland Computing Facility at the Georgia Institute of Technology, which is supported by the National Science Foundation under Contract OCI-0910735.

Author information



Corresponding author

Correspondence to George Teodoro.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 255 KB)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Teodoro, G., Valle, E., Mariano, N. et al. Approximate similarity search for online multimedia services on distributed CPU–GPU platforms. The VLDB Journal 23, 427–448 (2014).

Download citation


  • Descriptor indexing
  • Multimedia databases
  • Information retrieval
  • Hypercurves
  • Filter-stream