Discriminative Mixture-of-Templates for Viewpoint Classification

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6315)


Object viewpoint classification aims at predicting an approximate 3D pose of objects in a scene and is receiving increasing attention. State-of-the-art approaches to viewpoint classification use generative models to capture relations between object parts. In this work we propose to use a mixture of holistic templates (e.g. HOG) and discriminative learning for joint viewpoint classification and category detection. Inspired by the work of Felzenszwalb et al 2009, we discriminatively train multiple components simultaneously for each object category. A large number of components are learned in the mixture and they are associated with canonical viewpoints of the object through different levels of supervision, being fully supervised, semi-supervised, or unsupervised. We show that discriminative learning is capable of producing mixture components that directly provide robust viewpoint classification, significantly outperforming the state of the art: we improve the viewpoint accuracy on the Savarese et al 3D Object database from 57% to 74%, and that on the VOC 2006 car database from 73% to 86%. In addition, the mixture-of-templates approach to object viewpoint/pose has a natural extension to the continuous case by discriminatively learning a linear appearance model locally at each discrete view. We evaluate continuous viewpoint estimation on a dataset of everyday objects collected using IMUs for groundtruth annotation: our mixture model shows great promise comparing to a number of baselines including discrete nearest neighbor and linear regression.


Discriminative Learning Viewpoint Model Aspect Ratio Criterion Object Viewpoint Positive Training Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Koenderink, J., van Doorn, A.: The internal representation of solid shape with respect to vision. Biological Cybernetics 32, 211–216 (1979)zbMATHCrossRefGoogle Scholar
  2. 2.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. Int’l. J. Comp. Vision 60, 91–110 (2004)CrossRefGoogle Scholar
  3. 3.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  4. 4.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. TPAMI (2009)Google Scholar
  5. 5.
    Savarese, S., Fei-Fei, L.: 3d generic object categorization, localization and pose estimation. In: ICCV (2007)Google Scholar
  6. 6.
    Sun, M., Su, H., Savarese, S., Fei Fei, L.: A multi-view probabilistic model for 3d object classes. In: CVPR, pp. 1247–1254 (2009)Google Scholar
  7. 7.
    Su, H., Sun, M., Fei-Fei, L., Savarese, S.: Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories. In: ICCV (2009)Google Scholar
  8. 8.
    Everingham, M., Zisserman, A., Williams, C.K.I., Van Gool, L.: The PASCAL Visual Object Classes Challenge 2006, VOC 2006 Results (2006),
  9. 9.
    Arie-Nachmison, M., Basri, R.: Constructing implicit 3d shape models for pose estimation. In: ICCV (2009)Google Scholar
  10. 10.
    Cyr, C., Kimia, B.: A similarity-based aspect-graph approach to 3d object recognition. Int’l. J. Comp. Vision 57, 5–22 (2004)CrossRefGoogle Scholar
  11. 11.
    Hoiem, D., Rother, C., Winn, J.: 3d layoutcrf for multi-view object class recognition and segmentation. In: CVPR (2007)Google Scholar
  12. 12.
    Kushal, A., Schmid, C., Ponce, J.: Flexible object models for category-level 3d object recognition. In: CVPR (2004)Google Scholar
  13. 13.
    Chiu, H., Kaelbling, L., Lozano-Perez, T.: Virtual-training for multi-view object class recognition. In: CVPR (2007)Google Scholar
  14. 14.
    Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: 3d object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints. Int’l. J. Comp. Vision 66, 231–259 (2006)CrossRefGoogle Scholar
  15. 15.
    Berg, A., Berg, T., Malik, J.: Shape matching and object recognition using low distortion correspondence. In: CVPR, vol. 1, pp. 26–33 (2005)Google Scholar
  16. 16.
    Bulthoff, H., Edelman, S.: Psychophysical support for a two-dimensional view interpolation theory of object recognition. PNAS 89, 60–64 (1992)CrossRefGoogle Scholar
  17. 17.
    DeMenthon, D., Davis, L.: Model-based object pose in 25 lines of code. Int’l. J. Comp. Vision 15, 123–141 (1995)CrossRefGoogle Scholar
  18. 18.
    Lavallee, S., Szeliski, R.: Recovering the position and orientation of free-form objects from image contours using 3d distance maps. IEEE Trans. PAMI 17, 378–390 (1995)Google Scholar
  19. 19.
    Collet, A., Berenson, D., Srinivasa, S., Ferguson, D.: Object recognition and full pose registration from a single image for robotic manipulation. In: ICRA (2009)Google Scholar
  20. 20.
    Detry, R., Pugeault, N., Piater, J.: A probabilistic framework for 3D visual object representation. IEEE Trans. PAMI 31, 1790–1803 (2009)Google Scholar
  21. 21.
    Thomas, A., Ferrari, V., Leibe, B., Tuytelaars, T., Schiele, B., Gool, L.V.: Towards multi-view object class detection. In: CVPR (2006)Google Scholar
  22. 22.
    Liebelt, J., Schmid, C., Schertler, K.: Viewpoint-independent object class detection using 3d feature maps. In: CVPR (2008)Google Scholar
  23. 23.
    Vedaldi, A., Gulshan, V., Varma, M., Zisserman, A.: Multiple kernels for object detection. In: NIPS (2009)Google Scholar
  24. 24.
    Lampert, C.: Partitioning of image datasets using discriminative context information. In: CVPR, pp. 1–8 (2008)Google Scholar
  25. 25.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines (2001), software available at
  26. 26.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. PAMI 22, 888–905 (2000)Google Scholar
  27. 27.
    Aiolli, F., Sperduti, A.: Multiclass classification with multi-prototype support vector machines. Journal of Machine Learning Research (2005)Google Scholar
  28. 28.
    Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)zbMATHGoogle Scholar
  29. 29.
    Rosenhahn, B., Brox, T., Weickert, J.: Three-dimensional shape knowledge for joint image segmentation and pose tracking. Int’l. J. Comp. Vision 73, 243–262 (2007)CrossRefGoogle Scholar
  30. 30.
    Ozuysal, M., Lepetit, V., Fua, P.: Pose estimation for category specific multiview object localization. In: CVPR (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.University of California at BerkeleyBerkeleyUSA
  2. 2.Intel Labs SeattleSeattleUSA

Personalised recommendations