International Journal of Computer Vision

, Volume 88, Issue 2, pp 169–188 | Cite as

Gaussian Processes for Object Categorization

  • Ashish KapoorEmail author
  • Kristen Grauman
  • Raquel Urtasun
  • Trevor Darrell
Open Access


Discriminative methods for visual object category recognition are typically non-probabilistic, predicting class labels but not directly providing an estimate of uncertainty. Gaussian Processes (GPs) provide a framework for deriving regression techniques with explicit uncertainty models; we show here how Gaussian Processes with covariance functions defined based on a Pyramid Match Kernel (PMK) can be used for probabilistic object category recognition. Our probabilistic formulation provides a principled way to learn hyperparameters, which we utilize to learn an optimal combination of multiple covariance functions. It also offers confidence estimates at test points, and naturally allows for an active learning paradigm in which points are optimally selected for interactive labeling. We show that with an appropriate combination of kernels a significant boost in classification performance is possible. Further, our experiments indicate the utility of active learning with probabilistic predictive models, especially when the amount of training data labels that may be sought for a category is ultimately very small.


Object recognition Gaussian process Kernel combination Active learning 


  1. Abramson, Y., & Freund, Y. (2004). Active learning for visual object recognition (Technical report). UCSD. Google Scholar
  2. Belongie, S., Malik, J., & Puzicha, J. (2001). Matching shapes. In ICCV. Google Scholar
  3. Berg, A., & Malik, J. (2001). Geometric blur for template matching. In CVPR. Google Scholar
  4. Boiman, O., Shechtman, E., & Irani, M. (2008). In defense of nearest-neighbor based image classification. In CVPR. Google Scholar
  5. Bosch, A., Zisserman, A., & Muñoz, X. (2007). Representing shape with a spatial pyramid kernel. In CIVR. Google Scholar
  6. Chang, C., & Lin, C. (2001). LIBSVM: a library for SVMs. Google Scholar
  7. Chang, E. Y., Tong, S., Goh, K., & Chang, C. (2005). Support vector machine concept-dependent active learning for image retrieval. IEEE Transactions on Multimedia. Google Scholar
  8. Chum, O., & Zisserman, A. (2007). An exemplar model for learning object classes. In Proceedings of the IEEE conference on computer vision and pattern recognition. Google Scholar
  9. Evgeniou, T., Pontil, M., & Poggio, T. (2000). Regularization networks and support vector machines. Advances in Computational Mathematics, 13(1). Google Scholar
  10. Fei-Fei, L., Fergus, R., & Perona, P. (2006). One-shot learning of object categories. IEEE Transaction on Pattern Recognition and Machine Intelligence. Google Scholar
  11. Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In CVPR. Google Scholar
  12. Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning, 28(2–3). Google Scholar
  13. Frome, A., Singer, Y., Sha, F., & Malik, J. (2007). Learning globally-consistent local distance functions for shape-based image retrieval and classification. In ICCV. Google Scholar
  14. Grauman, K., & Darrell, T. (2005). The pyramid match kernel: Discriminative classification with sets of image features. In ICCV. Google Scholar
  15. Grauman, K., & Darrell, T. (2006a). Approximate correspondences in high dimensions. In NIPS. Google Scholar
  16. Grauman, K., & Darrell, T. (2006b). Unsupervised learning of categories from sets of partially matching image features. In CVPR. Google Scholar
  17. Kadir, T., & Brady, M. (2003). Scale saliency: A novel approach to salient feature and scale selection. In International conference visual information engineering. Google Scholar
  18. Kapoor, A., Grauman, K., Urtasun, R., & Darrell, T. (2007). Active learning with Gaussian processes for object categorization. In ICCV. Google Scholar
  19. Kim, H. C., Kim, D., Ghahramani, Z., & Bang, S. Y. (2006). Appearance-based gender classification with Gaussian processes. Pattern Recognition Letters. Google Scholar
  20. Krause, A., Singh, A., & Guestrin, C. (2008). Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. In JMLR. Google Scholar
  21. Kumar, A., & Sminchisescu, C. (2007). Support kernel machines for object recognition. In ICCV. Google Scholar
  22. Lawrence, N. (2004). Gaussian process latent variable models for visualisation of high dimensional data. In NIPS. Google Scholar
  23. Lawrence, N., Seeger, M., & Herbrich, R. (2002). Fast sparse Gaussian process method: Informative vector machines. In NIPS. Google Scholar
  24. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In CVPR. Google Scholar
  25. Lin, Y. Y., Liu, T. Y., & Fuh, C. S. (2007). Local ensemble kernel learning for object category recognition. In CVPR. Google Scholar
  26. Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 60(2). Google Scholar
  27. MacKay, D. (1992) Information-based objective functions for active data selection. Neural Computation, 4(4). Google Scholar
  28. McCallum, A. K., & Nigam, K. (1998). Employing EM in pool-based active learning for text classification. In ICML. Google Scholar
  29. Mikolajczyk, K., & Schmid, C. (2001). Indexing based on scale invariant interest points. In ICCV. Google Scholar
  30. Mikolajczyk, K., & Schmid, C. (2004). Scale and affine invariant interest point detectors. IJCV, 1(60), 63–86. CrossRefGoogle Scholar
  31. Minka, T. P. (2001). A family of algorithms for approximate Bayesian inference. PhD thesis, MIT. Google Scholar
  32. Moosmann, B. T. F., & Jurie, F. (2007). Fast discriminative visual codebooks using randomized clustering forests. In NIPS. Google Scholar
  33. Muslea, I., Minton, S., & Knoblock, C. A. (2002). Active + semi-supervised learning = robust multi-view learning. In ICML. Google Scholar
  34. Nister, D., & Stewenius, H. (2006). Scalable recognition with a vocabulary tree. In CVPR. Google Scholar
  35. Nowak, E., Jurie, F., & Triggs, B. (2006). Sampling strategies for bag-of-features image classification. In ECCV. Google Scholar
  36. Rasmusen, C. E., & Williams, C. (2006). Gaussian processes for machine learning. Cambridge: MIT Press. Google Scholar
  37. Seeger, M. (2004). Gaussian processes for machine learning. International Journal of Neural Systems, 14(2). Google Scholar
  38. Shen, Y., Ng, A., & Seeger, M. (2006). Fast Gaussian process regression using kd-trees. In NIPS. Google Scholar
  39. Sivic, J., & Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. In ICCV. Google Scholar
  40. Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering object categories in image collections. In ICCV. Google Scholar
  41. Snelson, E., & Ghahramani, Z. (2006). Sparse Gaussian processes using pseudo-inputs. In NIPS. Google Scholar
  42. Sudderth, E., Torralba, A., Freeman, W., & Willsky, A. (2005). Describing visual scenes using transformed Dirichlet processes. In NIPS. Google Scholar
  43. Tong, S., & Koller, D. (2000). Support vector machine active learning with applications to text classification. In ICML. Google Scholar
  44. Tresp, V. (2000). Mixtures of Gaussian processes. In NIPS. Google Scholar
  45. Tsang, I. W.-H., & Kwok, J. T.-Y. (2006). Efficient hyperkernel learning using second-order cone programming. IEEE Transactions on Neural Networks. Google Scholar
  46. Urtasun, R., & Darrell, T. (2008). Local probabilistic regression for activity-independent human pose inference. In CVPR. Google Scholar
  47. Urtasun, R., Fleet, D. J., Hertzman, A., & Fua, P. (2005). Priors for people tracking from small training sets. In ICCV. Google Scholar
  48. Urtasun, R., Fleet, D. J., & Fua, P. (2006). Gaussian process dynamical models for 3d people tracking. In CVPR. Google Scholar
  49. Varma, M., & Ray, D. (2007). Learning the discriminative power-invariance trade-off. In ICCV. Google Scholar
  50. von Ahn, L., & Dabbish, L. (2004). Labeling images with a computer game. In ACM CHI. Google Scholar
  51. von Ahn, L., Liu, R., & Blum, M. (2006). Peekaboom: A game for locating objects in images. In ACM CHI. Google Scholar
  52. Wallraven, C., Caputo, B., & Graf, A. (2003). Recognition with local features: the kernel recipe. In ICCV. Google Scholar
  53. Williams, C., & Barber, D. (1998). Bayesian classification with Gaussian processes. IEEE Transaction on Pattern Recognition and Machine Intelligence, 20(12), 1342–1351. CrossRefGoogle Scholar
  54. Williams, O. (2006). A switched Gaussian process for estimating disparity and segmentation in binocular stereo. In NIPS. Google Scholar
  55. Zhang, H., Berg, A., Maire, M., & Malik, J. (2006). SVM-KNN: Discriminative nearest neighbor classification for visual category recognition. In CVPR. Google Scholar
  56. Zhu, X., Lafferty, J., & Ghahramani, Z. (2003). Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions. In Workshop on the continuum from labeled to unlabeled data in machine learning and data mining at ICML. Google Scholar

Copyright information

© The Author(s) 2009

Open AccessThis is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (, which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Authors and Affiliations

  • Ashish Kapoor
    • 1
    Email author
  • Kristen Grauman
    • 2
  • Raquel Urtasun
    • 3
  • Trevor Darrell
    • 3
  1. 1.Microsoft ResearchRedmondUSA
  2. 2.University of Texas at AustinAustinUSA
  3. 3.UC Berkeley EECS & ICSIBerkeleyUSA

Personalised recommendations