The VLDB Journal

, Volume 23, Issue 3, pp 427–448 | Cite as

Approximate similarity search for online multimedia services on distributed CPU–GPU platforms

  • George TeodoroEmail author
  • Eduardo Valle
  • Nathan Mariano
  • Ricardo Torres
  • Wagner MeiraJr
  • Joel H. Saltz
Regular Paper


Similarity search in high-dimensional spaces is a pivotal operation for several database applications, including online content-based multimedia services. With the increasing popularity of multimedia applications, these services are facing new challenges regarding (1) the very large and growing volumes of data to be indexed/searched and (2) the necessity of reducing the response times as observed by end-users. In addition, the nature of the interactions between users and online services creates fluctuating query request rates throughout execution, which requires a similarity search engine to adapt to better use the computation platform and minimize response times. In this work, we address these challenges with Hypercurves, a flexible framework for answering approximate k-nearest neighbor (kNN) queries for very large multimedia databases. Hypercurves executes in hybrid CPU–GPU environments and is able to attain massive query-processing rates through the cooperative use of these devices. Hypercurves also changes its CPU–GPU task partitioning dynamically according to the observed load, aiming for optimal response times. In our empirical evaluation, dynamic task partitioning reduced query response times by approximately 50 % compared to the best static task partition. Due to a probabilistic proof of equivalence to the sequential kNN algorithm, the CPU–GPU execution of Hypercurves in distributed (multi-node) environments can be aggressively optimized, attaining superlinear scalability while still guaranteeing, with high probability, results at least as good as those from the sequential algorithm.


Descriptor indexing Multimedia databases Information retrieval Hypercurves Filter-stream GPGPU 



We would like to express our gratitude to the reviewers for their valuable comments, which helped us to improve our work both in terms of content and presentation. E. Valle thanks CNPq and FAPESP for the financial support of this work. R. Torres thanks CAPES, CNPq (grants 306580/2012-8, 484254/2012-0), and FAPESP for the financial support. W. Meira Jr. thanks CNPq, FAPEMIG and InWeb for financial support of this work. E. Valle and G. Teodoro thank the CENAPAD/UNICAMP for making available the computational resources required by the expensive experiments of this work. This research also used resources of the Keeneland Computing Facility at the Georgia Institute of Technology, which is supported by the National Science Foundation under Contract OCI-0910735.

Supplementary material

778_2013_329_MOESM1_ESM.pdf (256 kb)
Supplementary material 1 (pdf 255 KB)


  1. 1.
    The message passing interface (MPI).
  2. 2.
    Adan, I., Resing, J.: Queueing theory. Eindhoven University of Technology, Department of Mathematics and Computing Science, Eindhoven, The Netherlands, Lecture notes (2001)Google Scholar
  3. 3.
    Akune, F., Valle, E., Torres, R.: MONORAIL: a disk-friendly index for huge descriptor databases. In: 20th international conference on pattern recognition (ICPR) (2010)Google Scholar
  4. 4.
    Arefin, A.S., Riveros, C., Berretta, R., Moscato, P.: GPU-FS-kNN: a software tool for fast and scalable kNN computation using GPUs. PLoS ONE 7(8), e44000 (2012)CrossRefGoogle Scholar
  5. 5.
    Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. In: International Euro-Par conference on parallel processing, pp. 863–874 (2009)Google Scholar
  6. 6.
    Beecks, C., Seidl, T.: On stability of adaptive similarity measures for content-based image retrieval. In: Schoeffmann, K., Mérialdo, B., Hauptmann, A.G., Ngo, C.W., Andreopoulos, Y., Breiteneder, C. (eds) MMM, Lecture Notes in Computer Science, vol. 7131. Springer (2012)Google Scholar
  7. 7.
    Beecks, C., Uysal, M.S., Seidl, T.: Signature quadratic form distances for content-based similarity. In: Proceedings of the 17th ACM international conference on multimedia, MM ’09, pp. 697–700. ACM (2009)Google Scholar
  8. 8.
    Beecks, C., Uysal, M.S., Seidl, T.: Signature quadratic form distance. In: Proceedings of the ACM international conference on image and video retrieval, CIVR ’10, pp. 438–445. ACM (2010)Google Scholar
  9. 9.
    Bell, N., Hoberock, J.: Thrust: a productivity-oriented library for CUDA. In: Mei, W., Hwu, W. (ed.) GPU Gems. Jade Edition (2011)Google Scholar
  10. 10.
    Beynon, M., Ferreira, R., Kurc, T.M., Sussman, A., Saltz, J.H.: DataCutter: Middleware for filtering very large scientific datasets on archival storage systems. In: IEEE symposium on mass storage systems, pp. 119–134 (2000)Google Scholar
  11. 11.
    Bhatti, N.T., Hiltunen, M.A., Schlichting, R.D., Chiu, W.: Coyote: a system for constructing fine-grain configurable communication service. ACM Trans. Comput. Syst. 16(4), 321–366 (1998)CrossRefGoogle Scholar
  12. 12.
    Böhm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: index structures for improving the performance of multimedia databases. ACM Comput. Surv. 33(3), 322–373 (2001)CrossRefGoogle Scholar
  13. 13.
    Bosilca, G., Bouteiller, A., Herault, T., Lemarinier, P., Saengpatsa, N., Tomov, S., Dongarra, J.: Performance portability of a GPU enabled factorization with the DAGuE framework. In: IEEE international conference on cluster computing (CLUSTER) (2011)Google Scholar
  14. 14.
    Boureau, Y.L., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition, pp. 2559–2566. IEEE conference on computer vision and pattern recognition (2010)Google Scholar
  15. 15.
    Butz, A.R.: Alternative algorithm for Hilbert’s space-filling curve. IEEE Trans. Comput. 100(4), 424–426 (1971)CrossRefGoogle Scholar
  16. 16.
    Castelli, V.: Multidimensional indexing structures for content-based retrieval, pp. 373–433. Wiley, New York (2002)Google Scholar
  17. 17.
    Chandrasekhar, V., Sharifi, M., Ross, D.A.: Survey and evaluation of audio fingerprinting schemes for mobile query-by-example applications. In: Klapuri, A., Leider, C. (eds.) ISMIR, pp. 801–806. University of Miami, Miami (2011)Google Scholar
  18. 18.
    Chávez, E., Navarro, G., Baeza-Yates, R., Marroquín, J.L.: Searching in metric spaces. ACM Comput. Surv. 33(3), 273–321 (2001)CrossRefGoogle Scholar
  19. 19.
    Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of the twentieth annual symposium on computational geometry, SCG ’04. ACM (2004)Google Scholar
  20. 20.
    Deisher, M., Smelyanskiy, M., Nickerson, B., Lee, V.W., Chuvelev, M., Dubey, P.: Designing and dynamically load balancing hybrid LU for multi/many-core. Comput Sci Res Dev 26(3–4), 211–220 (2011)CrossRefGoogle Scholar
  21. 21.
    Du Mouza, C., Litwin, W., Rigaux, P.: Large-scale indexing of spatial data in distributed repositories: the SD-Rtree. VLDB J. 18, 933–958 (2009)CrossRefGoogle Scholar
  22. 22.
    Fagin, R., Kumar, R., Sivakumar, D.: Efficient similarity search and classification via rank aggregation. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, SIGMOD ’03. ACM (2003)Google Scholar
  23. 23.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems, PODS ’01, pp. 102–113. ACM (2001)Google Scholar
  24. 24.
    Faloutsos, C.: Gray codes for partial match and range queries. IEEE Trans. Softw. Eng. 14, 1381–1393 (1988)CrossRefzbMATHMathSciNetGoogle Scholar
  25. 25.
    Faloutsos, C.: Multimedia Indexing, pp. 435–464. Wiley, New York (2002)Google Scholar
  26. 26.
    Faloutsos, C., Roseman, S.: Fractals for secondary key retrieval. In: Proceedings of the Eighth ACM SIGACT-SIGMOD-SIGART, PODS ’89, pp. 247–252. ACM (1989)Google Scholar
  27. 27.
    Garcia, V., Debreuve, E., Barlaud, M.: Fast k nearest neighbor search using GPU. In: CVPR workshop on computer vision on GPU (CVGPU). Anchorage, Alaska, USA (2008)Google Scholar
  28. 28.
    Harris, M., Sengupta, S., Owens, J.D.: Parallel Prefix Sum (Scan) with CUDA. In: Nguyen, H. (ed.) GPU Gems 3, chap. 39, pp. 851–876. Addison Wesley, Reading (2007)Google Scholar
  29. 29.
    He, B., Fang, W., Luo, Q., Govindaraju, N.K., Wang, T.: Mars: A mapreduce framework on graphics processors. In: Parallel architectures and compilation techniques (2008)Google Scholar
  30. 30.
    Hua, G., Fu, Y., Turk, M., Pollefeys, M., Zhang, Z.: Introduction to the special issue on mobile vision. Int. J. Comput. Vis. 96, 277–279 (2012)CrossRefGoogle Scholar
  31. 31.
    Huo, X., Ravi, V., Agrawal, G.: Porting irregular reductions on heterogeneous CPU–GPU configurations. In: 18th international conference on high performance computing (HiPC) (2011)Google Scholar
  32. 32.
    Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: STOC, pp. 604–613 (1998)Google Scholar
  33. 33.
    Kato, K., Hosino, T.: Solving k-Nearest neighbor problem on multiple graphics processors. In: Proceedings of the 2010 10th IEEE/ACM international conference on cluster, cloud and grid computing, CCGRID ’10 (2010)Google Scholar
  34. 34.
    Krulis̆, M., Skopal, T., Lokoc̆, J., Beecks, C.: Combining CPU and GPU architectures for fast similarity search. Distrib. Parallel Databases 30, 179–207 (2012)CrossRefGoogle Scholar
  35. 35.
    Kuang, Q., Zhao, L.: A practical GPU based kNN algorithm. In: International symposium on computer science and computational technology (ISCSCT), pp. 151–155 (2009)Google Scholar
  36. 36.
    Liao, S., Lopez, M.A., Leutenegger, S.T.: High dimensional similarity search with space filling curves. In: Proceedings of the 17th international conference on data, engineering, pp. 615–622 (2001)Google Scholar
  37. 37.
    Linderman, M.D., Collins, J.D., Wang, H., Meng, T.H.: Merge: a programming model for heterogeneous multi-core systems. SIGPLAN Not. 43(3), 287–296 (2008)CrossRefGoogle Scholar
  38. 38.
    Liu, Y., Zhang, D., Lu, G., Ma, W.Y.: A survey of content-based image retrieval with high-level semantics. Pattern Recogn. 40(1), 262–282 (2007)CrossRefzbMATHGoogle Scholar
  39. 39.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004)CrossRefGoogle Scholar
  40. 40.
    Luk, C.K., Hong, S., Kim, H.: Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: 42nd international symposium on microarchitecture (MICRO) (2009)Google Scholar
  41. 41.
    Mainar-Ruiz, G., Perez-Cortes, J.C.: Approximate nearest neighbor search using a single space-filling curve and multiple representations of the data points. In: Proceedings of the 18th international conference on pattern recognition, pp. 502–505 (2006)Google Scholar
  42. 42.
    Megiddo, N., Shaft, U.: Efficient nearest neighbor indexing based on a collection of space filling curves. Technical Report IBM Research Report RJ 10093 (91909), IBM Almaden Research Center, San Jose California (1997)Google Scholar
  43. 43.
    Menascé, D., Almeida, V.: Capacity planning for web services: metrics, models and methods. Prentice Hall, Englewood (2002)Google Scholar
  44. 44.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal Mach Intel 27, 1615–1630 (2005)CrossRefGoogle Scholar
  45. 45.
    Morton, G.M.: A computer oriented geodetic data base and a new technique in file sequencing. Technical Report, IBM Ltd., Ottawa, Ontario, Canada (1966)Google Scholar
  46. 46.
    Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: In VISAPP international conference on computer vision theory and applications, pp. 331–340 (2009)Google Scholar
  47. 47.
    nVidia corporation: CUDA CUBLAS library (2010).
  48. 48.
    O’Malley, S.W., Peterson, L.L.: A dynamic network architecture. ACM Trans. Comput. Syst. 10(2), 110–113 (1992)CrossRefGoogle Scholar
  49. 49.
    Pan, J., Lauterbach, C., Manocha, D.: Efficient nearest-neighbor computation for GPU-based motion planning. In: 2010 IEEE/RSJ international conference on intelligent robots and systems (IROS), p. 2243–2248. IEEE (2010)Google Scholar
  50. 50.
    Pan, J., Manocha, D.: Fast GPU-based locality sensitive hashing for k-nearest neighbor computation. In: 19th ACM SIGSPATIAL international conference on advances in geographic information systems, GIS ’11. ACM (2011)Google Scholar
  51. 51.
    Pang, H., Ding, X., Zheng, B.: Efficient processing of exact top-k queries over disk-resident sorted lists. VLDB J. 19, 437–456 (2010)CrossRefGoogle Scholar
  52. 52.
    Penatti, O.A.B., Valle, E., Torres, RdS: Comparative study of global color and texture descriptors for web image retrieval. J. Vis. Comun. Image Rep. 23(2), 359–380 (2012)CrossRefGoogle Scholar
  53. 53.
    Ravi, V., Ma, W., Chiu, D., Agrawal, G.: Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In: Proceedings of the 24th ACM international conference on supercomputing, pp. 137–146. ACM (2010)Google Scholar
  54. 54.
    Sagan, H.: Space-filling curves. Springer, New York (1994)CrossRefzbMATHGoogle Scholar
  55. 55.
    Samet, H.: Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling). Morgan Kaufmann Publishers Inc, San Francisco (2005)Google Scholar
  56. 56.
    Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: IEEE international parallel and distributed processing symposium (IPDPS) (2009)Google Scholar
  57. 57.
    Shakhnarovich, G., Darrell, T., Indyk, P.: Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing). The MIT Press, Cambridge (2006)Google Scholar
  58. 58.
    Shepherd, J., Zhu, X., Megiddo, N.: A fast indexing method for multidimensional nearest neighbor search. In: SPIE conference on storage and retrieval for image and video databases VII, pp. 350–355 (1999)Google Scholar
  59. 59.
    Sismanis, N., Pitsianis, N., Sun, X.: Parallel search of k-nearest neighbors with synchronous operations (2012)Google Scholar
  60. 60.
    Smeulders, A., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Anal. Mach. Intell. 22(12), 1349–1380 (2000)CrossRefGoogle Scholar
  61. 61.
    Stone, Z., Zickler, T., Darrell, T.: Autotagging facebook: social network context improves photo annotation. In: IEEE computer vision and pattern recognition workshops (2008)Google Scholar
  62. 62.
    Sun, L., Stoller, C., Newhall, T.: Hybrid MPI and GPU approach to efficiently solving large kNN problems. Tera Grid Poster. URL (2010)
  63. 63.
    Sunderam, V.S.: PVM: a framework for parallel distributed computing. Concurr. Pract. Exp. 2(4), 315–340 (1990)CrossRefGoogle Scholar
  64. 64.
    Teodoro, G., Fireman, D., Guedes, D., Jr., W.M., Ferreira, R.: Achieving multi-level parallelism in filter-labeled stream programming model. In: The 37th international conference on parallel processing (ICPP) (2008)Google Scholar
  65. 65.
    Teodoro, G., Hartley, T.D.R., Catalyurek, U., Ferreira, R.: Run-time optimizations for replicated dataflows on heterogeneous environments. In: Proceedings of the 19th ACM international symposium on high performance distributed computing (HPDC) (2010)Google Scholar
  66. 66.
    Teodoro, G., Kurç, T.M., Pan, T., Cooper, L.A.D., Kong, J., Widener, P.M., Saltz, J.H.: Accelerating large scale image analyses on parallel, CPU-GPU equipped systems. In: IPDPS, pp. 1093–1104 (2012)Google Scholar
  67. 67.
    Teodoro, G., Pan, T., Kurc, T.M., Kong, J., Cooper, L.A., Podhorszki, N., Klasky, S., Saltz, J.H.: High-throughput analysis of large microscopy image datasets on CPU-GPU cluster platforms. In: IPDPS ’13 (2013)Google Scholar
  68. 68.
    Teodoro, G., Sachetto, R., Sertel, O., Gurcan, M., Jr., W.M., Catalyurek, U., Ferreira, R.: Coordinating the use of GPU and CPU for improving performance of compute intensive applications. In: IEEE cluster (2009)Google Scholar
  69. 69.
    Teodoro, G., Valle, E., Mariano, N., Torres, R., Meira Jr., W.: Adaptive parallel approximate similarity search for responsive multimedia retrieval. In: Proceedings of the 20th ACM international conference on information and knowledge management, CIKM ’11. ACM (2011)Google Scholar
  70. 70.
    Tuytelaars, T., Mikolajczyk, K.: Local invariant feature detectors: a survey. Found. Trends. Comput. Graph. Vis. 3, 177–280 (2008)Google Scholar
  71. 71.
    Valle, E., Cord, M., Philipp-Foliguet, S.: Fast identification of visual documents using local descriptors. In: Proceeding of the eighth ACM symposium on document engineering, DocEng ’08. ACM (2008) Google Scholar
  72. 72.
    Valle, E., Cord, M., Philipp-Foliguet, S.: High-dimensional descriptor indexing for large multimedia databases. In: Proceeding of the 17th ACM conference on information and knowledge management, CIKM ’08. ACM (2008)Google Scholar
  73. 73.
    Valle, E., Cord, M., Phillip-Folliguet, S., Gorisse, D.: Indexing personal image collections: a flexible, scalable solution. IEEE Trans. Consum. Elect. 56, 1167–1175 (2010)CrossRefGoogle Scholar
  74. 74.
    Vetter, J.S., Glassbrook, R., Dongarra, J., Schwan, K., Loftis, B., McNally, S., Meredith, J., Roth, P., Spafford, K., Yalamanchili, S.: Keeneland: bringing heterogeneous GPU computing to the computational science community. Comput. Sci. Eng. 13(5), 90–95 (2011)CrossRefGoogle Scholar
  75. 75.
    Welsh, M., Culler, D., Brewer, E.: SEDA: an architecture for well-conditioned, scalable internet services. SIGOPS Oper. Syst. Rev. 35(5) (2001)Google Scholar
  76. 76.
    Winder, S.A.J., Brown, M.: Learning local image descriptors. In: CVPR (2007)Google Scholar
  77. 77.
    Yiu, M.L., Mamoulis, N.: Multi-dimensional top-k dominating queries. VLDB J. 18, 695–718 (2009)CrossRefGoogle Scholar
  78. 78.
    Yu, H., Rauchwerger, L.: Adaptive reduction parallelization techniques. In: Proceedings of the 14th international conference on supercomputing, ICS ’00 (2000)Google Scholar
  79. 79.
    Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity search: the metric space approach, 1st edn. Springer Publishing Company, Springer (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • George Teodoro
    • 1
    Email author
  • Eduardo Valle
    • 2
  • Nathan Mariano
    • 3
  • Ricardo Torres
    • 4
  • Wagner MeiraJr
    • 3
  • Joel H. Saltz
    • 1
  1. 1.Center for Comprehensive InformaticsEmory UniversityAtlantaUSA
  2. 2.Recod Lab/DCA/FEECState University of CampinasCampinasBrazil
  3. 3.Department of Computer ScienceUniversidade Federal de Minas GeraisBelo HorizonteBrazil
  4. 4.Recod Lab/DSI/ICState University of CampinasCampinasBrazil

Personalised recommendations