Abstract
Visual categorization is important to manage large collections of digital images and video, where textual meta-data is often incomplete or simply unavailable. The bag-of-words model has become the most powerful method for visual categorization of images and video. Despite its high accuracy, a severe drawback of this model is its high computational cost. As the trend to increase computational power in newer CPU and GPU architectures is to increase their level of parallelism, exploiting this parallelism becomes an important direction to handle the computational cost of the bag-of-words approach. In this paper, we analyze the bag-of-words model for visual categorization in terms of computational cost and identify two major bottlenecks: the quantization step and the classification step. We address these two bottlenecks by proposing two efficient algorithms for quantization and classification by exploiting the GPU hardware and the CUDA parallel programming model. The algorithms are designed to keep categorization accuracy intact and give the same numerical results.
In the experiments on large scale datasets it is shown that, by using a parallel implementation on the GPU, quantization is 28 times faster and classification is 35 faster than a single-threaded CPU version, while giving the exact same numerical results. The GPU accelerations are applicable to both the learning phase and the testing phase of visual categorization systems. For software visit http://www.colordescriptors.com/ .
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Empowering visual categorization with the GPU. IEEE Transactions on Multimedia (2011) (in press)
Hollink, L., Huurnink, B., van Liempt, M., Oomen, J., de Jong, A., de Rijke, M., Schreiber, G., Smeulders, A.W.M.: A multidisciplinary approach to unlocking television broadcast archives. Interdisciplinary Science Reviews 34, 253–267 (2009)
Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision, pp. 1470–1477 (2003)
Zhang, J., Marszałek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision 73, 213–238 (2007)
Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. International Journal of Computer Vision 88, 303–338 (2010)
van de Sande, K.E.A., Gevers, T., Snoek, C.G.M.: Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1582–1596 (2010)
Jiang, Y.G., Yang, J., Ngo, C.W., Hauptmann, A.: Representations of keypoint-based semantic concept detection: A comprehensive study. IEEE Transactions on Multimedia 12, 42–53 (2010)
van de Sande, K.E.A., Gevers, T.: University of Amsterdam at the Visual Concept Detection and Annotation Tasks. The Information Retrieval Series: Image CLEF, vol. 32, ch. 18, pp. 343–358. Springer (2010)
Gaidon, A., Marszałek, M., Schmid, C.: The PASCAL visual object classes challenge 2008 submission. Technical report, INRIA-LEAR (2008)
Snoek, C.G.M., van de Sande, K.E.A., de Rooij, O., Huurnink, B., Uijlings, J.R.R., van Liempt, M., Bugalho, M., Trancoso, I., Yan, F., Tahir, M.A., Mikolajczyk, K., Kittler, J., de Rijke, M., Geusebroek, J.M., Gevers, T., Worring, M., Koelma, D.C., Smeulders, A.W.M.: The MediaMill TRECVID 2009 semantic video search engine. In: Proceedings of the TRECVID Workshop (2009)
Wang, D., Liu, X., Luo, L., Li, J., Zhang, B.: Video diver: generic video indexing with diverse features. In: ACM International Workshop on Multimedia Information Retrieval, pp. 61–70 (2007)
Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and TRECVid. In: ACM International Workshop on Multimedia Information Retrieval, pp. 321–330 (2006)
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Computer Vision and Image Understanding 110, 346–359 (2008)
Uijlings, J.R.R., Smeulders, A.W.M., Scha, R.J.H.: Real-time bag-of-words, approximately. In: ACM International Conference on Image and Video Retrieval (2009)
Chang, C.C., Li, Y.C., Yeh, J.B.: Fast codebook search algorithms based on tree-structured vector quantization. Pattern Recognition Letters 27, 1077–1086 (2006)
Moosmann, F., Triggs, B., Jurie, F.: Fast discriminative visual codebooks using randomized clustering forests. In: Neural Information Processing Systems, pp. 985–992 (2006)
Cornelis, N., Van Gool, L.: Fast scale invariant feature detection and matching on programmable graphics hardware. In: IEEE Computer Vision and Pattern Recognition Workshops (2008)
Sinha, S.N., Frahm, J.M., Pollefeys, M., Genc, Y.: Feature tracking and matching in video using programmable graphics hardware. Machine Vision and Applications (2007)
Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proceedings of the IEEE 96, 879–899 (2008)
Sharp, T.: Implementing Decision Trees and Forests on a GPU. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 595–608. Springer, Heidelberg (2008)
Catanzaro, B., Sundaram, N., Keutzer, K.: Fast support vector machine training and classification on graphics processors. In: International Conference on Machine Learning, pp. 104–111 (2008)
Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys 40, 1–60 (2008)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60, 91–110 (2004)
Mikolajczyk, K., et al.: A comparison of affine region detectors. International Journal of Computer Vision 65, 43–72 (2005)
Geusebroek, J.M., Smeulders, A.W.M., van de Weijer, J.: Fast anisotropic gauss filtering. IEEE Transactions on Image Processing 12, 938–943 (2003)
Jégou, H., Douze, M., Schmid, C.: Packing bag-of-features. In: IEEE International Conference on Computer Vision (2009)
van Gemert, J.C., Veenman, C.J., Smeulders, A.W.M., Geusebroek, J.M.: Visual word ambiguity. IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1271–1283 (2010)
Cai, D., He, X., Han, J.: Efficient kernel discriminant analysis via spectral regression. In: IEEE International Conference on Data Mining, pp. 427–432 (2007)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. (2001) Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Do, T.-N., Nguyen, V.-H., Poulet, F.: Speed Up SVM Algorithm for Massive Classification Tasks. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds.) ADMA 2008. LNCS (LNAI), vol. 5139, pp. 147–157. Springer, Heidelberg (2008)
Sengupta, S., Harris, M., Zhang, Y., Owens, J.D.: Scan primitives for GPU computing. In: Graphics Hardware, pp. 97–106 (2007)
Chang, D., Jones, N.A., Li, D., Ouyang, M.: Compute pairwise euclidean distances of data points with GPUs. In: Intelligent Systems and Control, pp. 278–283 (2008)
Snoek, C.G.M., Worring, M., van Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.M.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: ACM International Conference on Multimedia, pp. 421–430 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
van de Sande, K.E.A., Gevers, T., Snoek, C.G.M. (2012). Accelerating Visual Categorization with the GPU. In: Kutulakos, K.N. (eds) Trends and Topics in Computer Vision. ECCV 2010. Lecture Notes in Computer Science, vol 6554. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35740-4_34
Download citation
DOI: https://doi.org/10.1007/978-3-642-35740-4_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35739-8
Online ISBN: 978-3-642-35740-4
eBook Packages: Computer ScienceComputer Science (R0)