Accelerating Visual Categorization with the GPU

  • Koen E. A. van de Sande
  • Theo Gevers
  • Cees G. M. Snoek
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6554)


Visual categorization is important to manage large collections of digital images and video, where textual meta-data is often incomplete or simply unavailable. The bag-of-words model has become the most powerful method for visual categorization of images and video. Despite its high accuracy, a severe drawback of this model is its high computational cost. As the trend to increase computational power in newer CPU and GPU architectures is to increase their level of parallelism, exploiting this parallelism becomes an important direction to handle the computational cost of the bag-of-words approach. In this paper, we analyze the bag-of-words model for visual categorization in terms of computational cost and identify two major bottlenecks: the quantization step and the classification step. We address these two bottlenecks by proposing two efficient algorithms for quantization and classification by exploiting the GPU hardware and the CUDA parallel programming model. The algorithms are designed to keep categorization accuracy intact and give the same numerical results.

In the experiments on large scale datasets it is shown that, by using a parallel implementation on the GPU, quantization is 28 times faster and classification is 35 faster than a single-threaded CPU version, while giving the exact same numerical results. The GPU accelerations are applicable to both the learning phase and the testing phase of visual categorization systems. For software visit .


Vector Quantization Thread Block Graphic Hardware Visual Categorization Sift Descriptor 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Empowering visual categorization with the GPU. IEEE Transactions on Multimedia (2011) (in press)Google Scholar
  2. 2.
    Hollink, L., Huurnink, B., van Liempt, M., Oomen, J., de Jong, A., de Rijke, M., Schreiber, G., Smeulders, A.W.M.: A multidisciplinary approach to unlocking television broadcast archives. Interdisciplinary Science Reviews 34, 253–267 (2009)CrossRefGoogle Scholar
  3. 3.
    Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision, pp. 1470–1477 (2003)Google Scholar
  4. 4.
    Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision 73, 213–238 (2007)CrossRefGoogle Scholar
  5. 5.
    Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. International Journal of Computer Vision 88, 303–338 (2010)CrossRefGoogle Scholar
  6. 6.
    van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1582–1596 (2010)CrossRefGoogle Scholar
  7. 7.
    Jiang, Y.G., Yang, J., Ngo, C.W., Hauptmann, A.: Representations of keypoint-based semantic concept detection: A comprehensive study. IEEE Transactions on Multimedia 12, 42–53 (2010)CrossRefGoogle Scholar
  8. 8.
    van de Sande, K.E.A., Gevers, T.: University of Amsterdam at the Visual Concept Detection and Annotation Tasks. The Information Retrieval Series: Image CLEF, vol. 32, ch. 18, pp. 343–358. Springer (2010)Google Scholar
  9. 9.
    Gaidon, A., Marszałek, M., Schmid, C.: The PASCAL visual object classes challenge 2008 submission. Technical report, INRIA-LEAR (2008)Google Scholar
  10. 10.
    Snoek, C.G.M., van de Sande, K.E.A., de Rooij, O., Huurnink, B., Uijlings, J.R.R., van Liempt, M., Bugalho, M., Trancoso, I., Yan, F., Tahir, M.A., Mikolajczyk, K., Kittler, J., de Rijke, M., Geusebroek, J.M., Gevers, T., Worring, M., Koelma, D.C., Smeulders, A.W.M.: The MediaMill TRECVID 2009 semantic video search engine. In: Proceedings of the TRECVID Workshop (2009)Google Scholar
  11. 11.
    Wang, D., Liu, X., Luo, L., Li, J., Zhang, B.: Video diver: generic video indexing with diverse features. In: ACM International Workshop on Multimedia Information Retrieval, pp. 61–70 (2007)Google Scholar
  12. 12.
    Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and TRECVid. In: ACM International Workshop on Multimedia Information Retrieval, pp. 321–330 (2006)Google Scholar
  13. 13.
    Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Computer Vision and Image Understanding 110, 346–359 (2008)CrossRefGoogle Scholar
  14. 14.
    Uijlings, J.R.R., Smeulders, A.W.M., Scha, R.J.H.: Real-time bag-of-words, approximately. In: ACM International Conference on Image and Video Retrieval (2009)Google Scholar
  15. 15.
    Chang, C.C., Li, Y.C., Yeh, J.B.: Fast codebook search algorithms based on tree-structured vector quantization. Pattern Recognition Letters 27, 1077–1086 (2006)CrossRefGoogle Scholar
  16. 16.
    Moosmann, F., Triggs, B., Jurie, F.: Fast discriminative visual codebooks using randomized clustering forests. In: Neural Information Processing Systems, pp. 985–992 (2006)Google Scholar
  17. 17.
    Cornelis, N., Van Gool, L.: Fast scale invariant feature detection and matching on programmable graphics hardware. In: IEEE Computer Vision and Pattern Recognition Workshops (2008)Google Scholar
  18. 18.
    Sinha, S.N., Frahm, J.M., Pollefeys, M., Genc, Y.: Feature tracking and matching in video using programmable graphics hardware. Machine Vision and Applications (2007)Google Scholar
  19. 19.
    Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proceedings of the IEEE 96, 879–899 (2008)CrossRefGoogle Scholar
  20. 20.
    Sharp, T.: Implementing Decision Trees and Forests on a GPU. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 595–608. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  21. 21.
    Catanzaro, B., Sundaram, N., Keutzer, K.: Fast support vector machine training and classification on graphics processors. In: International Conference on Machine Learning, pp. 104–111 (2008)Google Scholar
  22. 22.
    Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys 40, 1–60 (2008)CrossRefGoogle Scholar
  23. 23.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)CrossRefGoogle Scholar
  24. 24.
    Mikolajczyk, K., et al.: A comparison of affine region detectors. International Journal of Computer Vision 65, 43–72 (2005)CrossRefGoogle Scholar
  25. 25.
    Geusebroek, J.M., Smeulders, A.W.M., van de Weijer, J.: Fast anisotropic gauss filtering. IEEE Transactions on Image Processing 12, 938–943 (2003)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Jégou, H., Douze, M., Schmid, C.: Packing bag-of-features. In: IEEE International Conference on Computer Vision (2009) Google Scholar
  27. 27.
    van Gemert, J.C., Veenman, C.J., Smeulders, A.W.M., Geusebroek, J.M.: Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1271–1283 (2010)CrossRefGoogle Scholar
  28. 28.
    Cai, D., He, X., Han, J.: Efficient kernel discriminant analysis via spectral regression. In: IEEE International Conference on Data Mining, pp. 427–432 (2007)Google Scholar
  29. 29.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. (2001) Software available at
  30. 30.
    Do, T.-N., Nguyen, V.-H., Poulet, F.: Speed Up SVM Algorithm for Massive Classification Tasks. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds.) ADMA 2008. LNCS (LNAI), vol. 5139, pp. 147–157. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  31. 31.
    Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan primitives for GPU computing. In: Graphics Hardware, pp. 97–106 (2007)Google Scholar
  32. 32.
    Chang, D., Jones, N.A., Li, D., Ouyang, M.: Compute pairwise euclidean distances of data points with GPUs. In: Intelligent Systems and Control, pp. 278–283 (2008)Google Scholar
  33. 33.
    Snoek, C.G.M., Worring, M., van Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.M.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: ACM International Conference on Multimedia, pp. 421–430 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Koen E. A. van de Sande
    • 1
  • Theo Gevers
    • 1
  • Cees G. M. Snoek
    • 1
  1. 1.Intelligent Systems Lab Amsterdam (ISLA)University of AmsterdamAmsterdamThe Netherlands

Personalised recommendations