International Journal of Computer Vision

, Volume 121, Issue 2, pp 253–280 | Cite as

Large-Scale Gaussian Process Inference with Generalized Histogram Intersection Kernels for Visual Recognition Tasks

  • Erik Rodner
  • Alexander Freytag
  • Paul Bodesheim
  • Björn Fröhlich
  • Joachim Denzler
Article

Abstract

We present new methods for fast Gaussian process (GP) inference in large-scale scenarios including exact multi-class classification with label regression, hyperparameter optimization, and uncertainty prediction. In contrast to previous approaches, we use a full Gaussian process model without sparse approximation techniques. Our methods are based on exploiting generalized histogram intersection kernels and their fast kernel multiplications. We empirically validate the suitability of our techniques in a wide range of scenarios with tens of thousands of examples. Whereas plain GP models are intractable due to both memory consumption and computation time in these settings, our results show that exact inference can indeed be done efficiently. In consequence, we enable every important piece of the Gaussian process framework—learning, inference, hyperparameter optimization, variance estimation, and online learning—to be used in realistic scenarios with more than a handful of data.

Keywords

Large-scale learning Gaussian processes Hyperparameter optimization Visual recognition 

References

  1. Ablavsky, V., & Sclaroff, S. (2011). Learning parameterized histogram kernels on the simplex manifold for image and action classification. In IEEE international conference of computer vision (ICCV) pp. 1473–1480.Google Scholar
  2. Bai, Z., & Golub, G. H. (1997). Bounds for the trace of the inverse and the determinant of symmetric positive definite matrices. Annals of Numerical Mathematics, 4(1–4), 29–38.MathSciNetMATHGoogle Scholar
  3. Barla, A., Odone, F., & Verri, A. (2003). Histogram intersection kernel for image classification. In IEEE international conference on image processing (ICIP), pp. 513–516.Google Scholar
  4. Berg, A.C., Deng, J., & Fei-Fei, L. (2010). Large scale visual recognition challenge. http://www.imagenet.org/challenges/LSVRC/2010/.
  5. Bo, L., & Sminchisescu, C. (2008). Greedy block coordinate descent for large scale gaussian process regression. In Uncertainty in artificial intelligence (UAI).Google Scholar
  6. Bo, L., & Sminchisescu, C. (2010). Twin gaussian processes for structured prediction. International Journal of Computer Vision (IJCV), 87(1–2), 28–52.CrossRefGoogle Scholar
  7. Bo, L., & Sminchisescu, C. (2012). Greedy block coordinate descent for large scale gaussian process regression. Computing Research Repository (CoRR) abs/1206.3238, previous publication in Uncertainty for Artificial Intelligence (UAI).Google Scholar
  8. Bonilla, E.V., Chai, K.M.A., & Williams, C.K.I. (2008). Multi-task gaussian process prediction. In Advances in Neural Information Processing Systems (NIPS) (pp. 153–160). Cambridge: MIT Press.Google Scholar
  9. Bottou, L., Chapelle, O., DeCoste, D., & Weston, J. (Eds.). (2007). Large-scale kernel machines. Cambridge: MIT Press.Google Scholar
  10. Boughorbel, S., Tarel, J.P., & Boujemaa, N. (2005). Generalized histogram intersection kernel for image recognition. In IEEE international conference on image processing (ICIP), pp. 161–164.Google Scholar
  11. Chang, C. C., & Lin, C. J. (2011). Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2, 1–27.CrossRefGoogle Scholar
  12. Deng, J., Berg, A., Li, K., & Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us? In European Conference on Computer Vision (ECCV), pp. 71–84.Google Scholar
  13. Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., & Darrell, T. (2013). Decaf: A deep convolutional activation feature for generic visual recognition. arXiv preprint arXiv:1310.1531.
  14. Ebert, S., Fritz, M., & Schiele, B. (2012). Ralf: A reinforced active learning formulation for object class recognition. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 3626–3633.Google Scholar
  15. Fan, R. E., Chang, K. W., Hsieh, C. J., Wang, X. R., & Lin, C. J. (2008). Liblinear: A library for large linear classification. Journal of Machine Learning Research (JMLR), 9, 1871–1874.MATHGoogle Scholar
  16. Freytag, A., Fröhlich, B., Rodner, E., & Denzler, J. (2012a). Efficient semantic segmentation with gaussian processes and histogram intersection kernels. In International conference on pattern recognition (ICPR), pp. 3313–3316.Google Scholar
  17. Freytag, A., Rodner, E., Bodesheim, P., & Denzler, J. (2012b). Rapid uncertainty computation with gaussian processes and histogram intersection kernels. In: Asian conference on computer vision (ACCV), pp. 511–524.Google Scholar
  18. Freytag, A., Rodner, E., Bodesheim, P., & Denzler, J. (2013). Labeling examples that matter: Relevance-based active learning with gaussian processes. In German conference on pattern recognition (GCPR), pp. 282–291.Google Scholar
  19. Freytag, A., Rodner, E., & Denzler, J. (2014a). Selecting influential examples: Active learning with expected model output changes. In: European conference on computer vision (ECCV).Google Scholar
  20. Freytag, A., Rühle, J., Bodesheim, P., Rodner, E., & Denzler, J. (2014b). Seeing through bag-of-visual-word glasses: towards understanding quantization effects in feature extraction methods. In: International conference on pattern recognition (ICPR)—FEAST workshop.Google Scholar
  21. Grauman, K., & Darrell, T. (2007). The pyramid match kernel: Efficient learning with sets of features. Journal of Machine Learning Research (JMLR), 8(Apr), 725–760.MATHGoogle Scholar
  22. He, H., & Siu, W.C. (2011). Single image super-resolution using gaussian process regression. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 449–456.Google Scholar
  23. Hestenes, M. R., & Stiefel, E. (1952). Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49(6), 409–436.MathSciNetCrossRefMATHGoogle Scholar
  24. Käding, C., Freytag, A., Rodner, E., Bodesheim, P., & Denzler, J. (2015). Active learning and discovery of object categories in the presence of unnameable instances. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 4343–4352.Google Scholar
  25. Kapoor, A., Grauman, K., Urtasun, R., & Darrell, T. (2010). Gaussian processes for object categorization. International Journal of Computer Vision (IJCV), 88(2), 169–188.CrossRefGoogle Scholar
  26. Kemmler, M., Rodner, E., & Denzler, J. (2010). One-class classification with gaussian processes. In: Asian conference on computer vision (ACCV), vol. 2, pp. 489–500.Google Scholar
  27. Krizhevsky, A., Sutskever, I., & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS).Google Scholar
  28. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 2169–2178.Google Scholar
  29. Maji, S., Berg, A.C., & Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 1–8.Google Scholar
  30. Maji, S., Berg, A. C., & Malik, J. (2013). Efficient classification for additive kernel svms. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 35(1), 66–77.CrossRefGoogle Scholar
  31. Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. Computer Journal, 7(4), 308–313.MathSciNetCrossRefMATHGoogle Scholar
  32. Nickisch, H., & Rasmussen, C. E. (2008). Approximations for binary gaussian process classification. Journal of Machine Learning Research, 9(10), 2035–2078.MathSciNetMATHGoogle Scholar
  33. Nocedal, J., & Wright, S. J. (2006). Conjugate gradient methods. New York: Springer.Google Scholar
  34. Perronnin, F., Akata, Z., Harchaoui, Z., & Schmid, C. (2012). Towards good practice in large-scale learning for image classification. In IEEE conference on computer vision and pattern recognition (CVPR), pp. 3482–3489.Google Scholar
  35. Pillonetto, G., Dinuzzo, F., & Nicolao, G. D. (2010). Bayesian online multitask learning of gaussian processes. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 32(2), 193–205.CrossRefGoogle Scholar
  36. Quattoni, A., & Torralba, A. (2009). Recognizing indoor scenes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 413–420.Google Scholar
  37. Quiñonero Candela, J., & Rasmussen, C. E. (2005). A unifying view of sparse approximate gaussian process regression. Journal of Machine Learning Research (JMLR), 6, 1939–1959.MathSciNetMATHGoogle Scholar
  38. Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian processes for machine learning. Adaptive computation and machine learning. Cambridge: The MIT Press.MATHGoogle Scholar
  39. Rodner, E. (2011). Learning from few examples for visual recognition problems. Hut Verlag München, http://herakles.inf-cv.uni-jena.de/biborb/bibs/cv/papers/Rodner11:Diss.pdf.
  40. Rodner, E., Hegazy, D., & Denzler, J. (2010). Multiple kernel gaussian process classification for generic 3d object recognition from time-of-flight images. In: International conference on image and vision computing New Zealand (IVCNZ), pp. 1–8.Google Scholar
  41. Rodner, E., Freytag, A., Bodesheim, P., & Denzler, J. (2012). Large-scale gaussian process classification with flexible adaptive histogram kernels. In European conference on computer vision (ECCV), vol. 4, pp. 85–98.Google Scholar
  42. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115, 1–42.MathSciNetCrossRefGoogle Scholar
  43. Schölkopf, B., & Smola, A. J. (2001). Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge: MIT Press.Google Scholar
  44. Snoek, J., Larochelle, H., & Adams, R.P. (2012). Practical bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems (NIPS), pp. 2951–2959.Google Scholar
  45. Sun, J., & Ponce, J. (2013). Learning discriminative part detectors for image classification and cosegmentation. In IEEE International conference on computer vision (ICCV).Google Scholar
  46. Tong, S., & Koller, D. (2001). Support vector machine active learning with applications to text classification. Journal of Machine Learning Research (JMLR), 2(Nov), 45–66.MATHGoogle Scholar
  47. Urtasun, R., & Darrell, T. (2008). Sparse probabilistic regression for activity-independent human pose inference. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 1–8.Google Scholar
  48. Vapnik, V. (1998). Statistical learning theory (Vol. 2). New York: Wiley.MATHGoogle Scholar
  49. Vázquez, D., Marin, J., Lopez, A. M., Geronimo, D., & Ponsa, D. (2014). Virtual and real world adaptationfor pedestrian detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(4), 797–809.CrossRefGoogle Scholar
  50. Vedaldi, A., & Zisserman, A. (2010). Efficient additive kernels via explicit feature maps. In: IEEE conference on computer vision and pattern recognition (CVPR), pp. 3539–3546.Google Scholar
  51. Vedaldi, A., Gulshan, V., Varma, M., & Zisserman, A. (2009). Multiple kernels for object detection. In IEEE international conference on computer vision (ICCV), pp. 606–613.Google Scholar
  52. Wang, G., Hoiem, D., & Forsyth, D. (2012). Learning image similarity from flickr groups using fast kernel machines. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(11), 2177–2188.CrossRefGoogle Scholar
  53. Williams, C.K.I., & Seeger, M. (2000). The effect of the input density distribution on kernel-based classifiers. In: International Conference on Machine Learning (ICML), pp. 1159–1166.Google Scholar
  54. Wu, J. (2010). A fast dual method for hik svm learning. In: European conference on computer vision (ECCV), pp. 552–565.Google Scholar
  55. Wu, J. (2012). Efficient hik svm learning for image classification. IEEE Transactions on Image Processing (TIP), 21(10), 4442–4453.MathSciNetCrossRefGoogle Scholar
  56. Yuan, Q., Thangali, A., Ablavsky, V., & Sclaroff, S. (2008). Multiplicative kernels: Object detection, segmentation and pose estimation. In Computer vision and pattern recognition (CVPR), IEEE, pp. 1–8.Google Scholar
  57. Yuster, R. (2008). Matrix sparsification for rank and determinant computations via nested dissection. In: IEEE symposium on foundations of computer science (FOCS), pp. 137–145.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Erik Rodner
    • 1
    • 2
  • Alexander Freytag
    • 1
    • 2
  • Paul Bodesheim
    • 3
  • Björn Fröhlich
    • 4
  • Joachim Denzler
    • 1
    • 2
  1. 1.Friedrich Schiller University JenaJenaGermany
  2. 2.Michael Stifel Center JenaJenaGermany
  3. 3.Max Planck Institute for BiogeochemistryJenaGermany
  4. 4.Daimler AG Research & DevelopmentBöblingenGermany

Personalised recommendations