Part-Pair Representation for Part Localization

  • Jiongxin Liu
  • Yinxiao Li
  • Peter N. Belhumeur
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8690)


In this paper, we propose a novel part-pair representation for part localization. In this representation, an object is treated as a collection of part pairs to model its shape and appearance. By changing the set of pairs to be used, we are able to impose either stronger or weaker geometric constraints on the part configuration. As for the appearance, we build pair detectors for each part pair, which model the appearance of an object at different levels of granularities. Our method of part localization exploits the part-pair representation, featuring the combination of non-parametric exemplars and parametric regression models. Non-parametric exemplars help generate reliable part hypotheses from very noisy pair detections. Then, the regression models are used to group the part hypotheses in a flexible way to predict the part locations. We evaluate our method extensively on the dataset CUB-200-2011 [32], where we achieve significant improvement over the state-of-the-art method on bird part localization. We also experiment with human pose estimation, where our method produces comparable results to existing works.


part localization part-pair representation pose estimation 


  1. 1.
    Amberg, B., Vetters, T.: Optimal landmark detection using shape models and branch and bound. In: Proc. ICCV (2011)Google Scholar
  2. 2.
    Azizpour, H., Laptev, I.: Object detection using strongly-supervised deformable part models. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 836–849. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  3. 3.
    Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. In: Proc. CVPR (2011)Google Scholar
  4. 4.
    Berg, T., Belhumeur, P.N.: POOF: Part-based one-vs-one features for fine-grained categorization, face verification, and attribute estimation. In: Proc. CVPR (2013)Google Scholar
  5. 5.
    Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting people using mutually consistent poselet activations. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 168–181. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Branson, S., Beijbom, O., Belongie, S.: Efficient large-scale structured learning. In: Proc. CVPR (2013)Google Scholar
  7. 7.
    Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. In: Proc. CVPR (2012)Google Scholar
  8. 8.
    Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.: Detect what you can: Detecting and representing objects using holistic models and body parts. In: Proc. CVPR (2014)Google Scholar
  9. 9.
    Cootes, T., Edwards, G., Taylor, C.: Active appearance models. IEEE TPAMI (2001)Google Scholar
  10. 10.
    Cristinacce, D., Cootes, T.: Feature detection and tracking with constrained local models. In: Proc. BMVC (2006)Google Scholar
  11. 11.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proc. CVPR (2005)Google Scholar
  12. 12.
    Dollár, P.: Piotr’s Image and Video Matlab Toolbox (PMT),
  13. 13.
    Dollár, P., Appel, R., Kienzle, W.: Crosstalk cascades for frame-rate pedestrian detection. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 645–659. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  14. 14.
    Dollár, P., Belongie, S., Perona, P.: The fastest pedestrian detector in the west. In: Proc. BMVC (2010)Google Scholar
  15. 15.
    Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: Proc. BMVC (2009)Google Scholar
  16. 16.
    Everingham, M., Sivic, J., Zisserman, A.: “Hello! my name is... buffy” automatic naming of characters in tv video. In: Proc. BMVC (2006)Google Scholar
  17. 17.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. In: IEEE TPAMI (2010)Google Scholar
  18. 18.
    Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. IJCV 61(1), 55–79 (2005)CrossRefGoogle Scholar
  19. 19.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: Proc. BMVC (2010)Google Scholar
  20. 20.
    Liu, J., Belhumeur, P.N.: Bird part localization using exemplar-based models with enforced pose and subcategory consistency. In: Proc. ICCV (2013)Google Scholar
  21. 21.
    Matthews, I., Baker, S.: Active appearance models revisited. In: IJCV (2004)Google Scholar
  22. 22.
    Milborrow, S., Nicolls, F.: Locating facial features with an extended active shape model. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 504–513. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  23. 23.
    Ouyang, W., Wang, X.: Joint deep learning for pedestrian detection. In: Proc. ICCV (2013)Google Scholar
  24. 24.
    Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Poselet conditioned pictorial structures. In: Proc. CVPR (2013)Google Scholar
  25. 25.
    Pishchulin, L., Andriluka, M., Gehler, P., Schiele, B.: Strong appearance and expressive spatial models for human pose estimation. In: Proc. ICCV (2013)Google Scholar
  26. 26.
    Ramanan, D.: Learning to parse images of articulated bodies. In: Proc. NIPS (2006)Google Scholar
  27. 27.
    Ren, X., Ramanan, D.: Histograms of sparse codes for object detection. In: Proc. CVPR (2013)Google Scholar
  28. 28.
    Singh, S., Gupta, A., Efros, A.A.: Unsupervised discovery of mid-level discriminative patches. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 73–86. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  29. 29.
    Sun, M., Savarese, S.: Articulated part-based model for joint object detection and pose estimation. In: Proc. ICCV (2011)Google Scholar
  30. 30.
    Szegedy, C., Toshev, A., Erhan, D.: Deep neural networks for object detection. In: Proc. NIPS (2013)Google Scholar
  31. 31.
    Viola, P., Jones, M.: Robust real-time object detection. IJCV 57(2), 137–154 (2001)CrossRefGoogle Scholar
  32. 32.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The caltech-ucsd birds-200-2011 dataset. Computation & Neural Systems Technical Report, CNS-TR-2011-001 (2011)Google Scholar
  33. 33.
    Wang, Y., Tran, D., Liao, Z.: Learning hierarchical poselets for human parsing. In: Proc. CVPR (2011)Google Scholar
  34. 34.
    Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: Proc. CVPR (2011)Google Scholar
  35. 35.
    Zhou, F., Brandt, J., Lin, Z.: Exemplar-based graph matching for robust facial landmark localization. In: Proc. ICCV (2013)Google Scholar
  36. 36.
    Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: Proc. CVPR (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jiongxin Liu
    • 1
  • Yinxiao Li
    • 1
  • Peter N. Belhumeur
    • 1
  1. 1.Columbia UniversityUSA

Personalised recommendations