A Study of Parts-Based Object Class Detection Using Complete Graphs

  • Martin Bergtholdt
  • Jörg Kappes
  • Stefan Schmidt
  • Christoph Schnörr
Article

Abstract

Object detection is one of the key components in modern computer vision systems. While the detection of a specific rigid object under changing viewpoints was considered hard just a few years ago, current research strives to detect and recognize classes of non-rigid, articulated objects. Hampered by the omnipresent confusing information due to clutter and occlusion, the focus has shifted from holistic approaches for object detection to representations of individual object parts linked by structural information, along with richer contextual descriptions of object configurations. Along this line of research, we present a practicable and expandable probabilistic framework for parts-based object class representation, enabling the detection of rigid and articulated object classes in arbitrary views. We investigate learning of this representation from labelled training images and infer globally optimal solutions to the contextual MAP-detection problem, using A *-search with a novel lower-bound as admissible heuristic. An assessment of the inference performance of Belief-Propagation and Tree-Reweighted Belief Propagation is obtained as a by-product. The generality of our approach is demonstrated on four different datasets utilizing domain dependent information cues.

Keywords

Object detection Object class recognition Graphical models Conditional random fields Classification Multi-class Learning Inference Optimization Single view Multiple view 2D/3D pose 

References

  1. Agarwal, A., & Triggs, B. (2006). Recovering 3D human pose from monocular images. IEEE PAMI, 28(1), 44–58. Google Scholar
  2. Balan, A., Black, M., Haussecker, H., & Sigal, L. (2007). Shining a light on human pose: On shadows, shading and the estimation of pose and shape. In ICCV. Google Scholar
  3. Becker, F. (2004). Matrix-valued filters as convex programs. Master’s thesis, CVGPR group, University of Mannheim. Google Scholar
  4. Belongie, S., Malik, J., & Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE PAMI, 24(4), 509–522. Google Scholar
  5. Bennett, K., & Parrado-Hernández, E. (2006). The interplay of optimization and machine learning research. IJCV, 7, 1265–1281. Google Scholar
  6. Bergtholdt, M., Kappes, J. H., & Schnörr, C. (2006a). Graphical knowledge representation for human detection. In Int. works. on the representation and use of prior knowledge in vision. Google Scholar
  7. Bergtholdt, M., Kappes, J. H., & Schnörr, C. (2006b). Learning of graphical models and efficient inference for object class recognition. In Ann. symp. German assoc. for patt. recog. Google Scholar
  8. Bray, M., Kohli, P., & Torr, P. (2006). Posecut: Simultaneous segmentation and 3D pose estimation of humans using dynamic graph-cuts. In ECCV (pp. 642–655). Google Scholar
  9. Cheng, S. Y., & Trivedi, M. M. (2006). Articulated human body pose inference from voxel data using a kinematically constrained Gaussian mixture model. In CVPR EHuM2: 2nd workshop on evaluation of articulated human motion and pose estimation. Google Scholar
  10. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37–46. CrossRefGoogle Scholar
  11. Coughlan, J. M., & Ferreira, S. J. (2002). Finding deformable shapes using loopy belief propagation. In ECCV (pp. 453–468). Berlin: Springer. Google Scholar
  12. Coughlan, J., & Shen, H. (2004). Shape matching with belief propagation: Using dynamic quantization to accommodate occlusion and clutter. In CVPR workshop (p. 180). Los Alamitos: IEEE Computer Society. Google Scholar
  13. Coughlan, J., & Yuille, A. (2002). Bayesian A * tree search with expected O(N) node expansions: applications to road tracking. Neural Computation, 14(8), 1929–1958. MATHCrossRefGoogle Scholar
  14. Cowell, R., Dawid, A., Lauritzen, S., & Spiegelhalter, D. (2003). Probabilistic networks and expert systems. Berlin: Springer. Google Scholar
  15. Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV int. workshop on stat. learn. in comp. vis. Google Scholar
  16. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In CVPR (pp. 886–893). Google Scholar
  17. DeGroot, M. H., & Fienberg, S. E. (1982). The comparison and evaluation of forecasters. Statistician, 32(1), 12–22. CrossRefGoogle Scholar
  18. Everingham, M., Zisserman, A., Williams, C. K. I., & Van Gool, L. (2006). The PASCAL visual object classes challenge 2006 (VOC2006) Results. http://www.pascal-network.org/challenges/VOC/voc2006/results.pdf.
  19. Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2007). The PASCAL visual object classes challenge 2007 (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.
  20. Fawcett, T. (2004). ROC graphs: Notes and practical considerations for researchers. Google Scholar
  21. Felzenszwalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. IJCV, 61(1), 55–79. CrossRefGoogle Scholar
  22. Fergus, R., Perona, P., & Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In CVPR (Vol. 2, pp. 264–271). Google Scholar
  23. Fergus, R., Perona, P., & Zisserman, A. (2005). A sparse object category model for efficient learning and exhaustive recognition. In CVPR. Google Scholar
  24. Fergus, R., Perona, P., & Zisserman, A. (2007). Weakly supervised scale-invariant learning of models for visual recognition. IJCV, 71(3), 273–303. CrossRefGoogle Scholar
  25. Fergus, R., Weber, M., & Perona, P. (2001). Efficient methods for object recognition using the constellation model. Tech. rep., California Institute of Technology. Google Scholar
  26. Ferrari, V., Marin-Jimenez, M., & Zisserman, A. (2008). Progressive search space reduction for human pose estimation. In ICCV. Google Scholar
  27. Frey, B., & Jojic, N. (2005). A comparison of algorithms for inference and learning in probabilistic graphical models. IEEE PAMI, 27(9), 1392–1416. Google Scholar
  28. Gavrila, D. (2007). A Bayesian, exemplar-based approach to hierarchical shape matching. IEEE PAMI, 29(8), 1408–1421. Google Scholar
  29. Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 36(1), 3–42. CrossRefGoogle Scholar
  30. Gupta, A., Mittal, A., & Davis, L. S. (2008). Constraint integration for efficient multiview pose estimation with self-occlusions. IEEE PAMI, 30(3), 493–506. Google Scholar
  31. Hart, P., Nilsson, N., & Raphael, B. (1968). A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4, 100–107. CrossRefGoogle Scholar
  32. Hartley, R. I. (1992). Estimation of relative camera positions for uncalibrated cameras. In Lect. notes comp. sci. : Vol. 588. ECCV (pp. 589–587). Berlin: Springer. Google Scholar
  33. Hinton, G. E. (2002). Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 1771–1800. MATHCrossRefGoogle Scholar
  34. Howe, N. R. (2007). Recognition-based motion capture and the HumanEva II test data. In CVPR EHuM2: 2nd workshop on evaluation of articulated human motion and pose estimation. Google Scholar
  35. Jiang, H., & Martin, D. R. (2008). Global pose estimation using non-tree models. In CVPR. Google Scholar
  36. Kolmogorov, V. (2006). Convergent tree-reweighted message passing for energy minimization. IJCV, 28(10), 1568–1583. Google Scholar
  37. Kolmogorov, V., & Rother, C. (2006). Comparison of energy minimization algorithms for highly connected graphs. In ECCV. Google Scholar
  38. Komodakis, N., & Tziritas, G. (2007). Approximate labeling via graph cuts based on linear programming. IEEE PAMI, 29(8), 2649–2661. MathSciNetGoogle Scholar
  39. Kumar, S., & Hebert, M. (2006). Discriminative random fields. IJCV, 68(2), 179–201. CrossRefGoogle Scholar
  40. Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. int. conf. on mach. learn. Google Scholar
  41. Lee, M. W., & Cohen, I. (2006). A model-based approach for estimating human 3D poses in static images. IEEE PAMI, 28(6), 905–916. Google Scholar
  42. Lepetit, V., & Fua, P. (2006). Keypoint recognition using randomized trees. IEEE PAMI, 28(9), 1465–1479. Google Scholar
  43. Levin, A., & Weiss, Y. (2006). Learning to combine bottom-up and top-down segmentation. In ECCV (pp. 581–594). Google Scholar
  44. Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. IJCV, 60(2), 91–110. CrossRefGoogle Scholar
  45. Mikolajczyk, K., Schmid, C., & Zisserman, A. (2004). Human detection based on a probabilistic assembly of robust part detectors. In ECCV. Berlin: Springer. Google Scholar
  46. Mori, G., & Malik, J. (2006). Recovering 3D human body configurations using shape contexts. IEEE PAMI, 28(7), 1052–1062. Google Scholar
  47. Pearl, J. (1984). Heuristics: intelligent search strategies for computer problem solving. Reading: Addison-Wesley. Google Scholar
  48. Pham, T. V., & Smeulders, A. W. M. (2005). Object recognition with uncertain geometry and uncertain part detection. Computer Vision and Image Understanding, 99(2), 241–258. CrossRefGoogle Scholar
  49. Platt, J. (2000). Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in large margin classifiers (pp. 61–74). Cambridge: MIT Press. Google Scholar
  50. Ponce, J., Hebert, M., Schmid, C., & Zisserman, A. (eds.) (2006). Toward category-level object recognition. Lect. notes comp. sci., Vol. 4170. Berlin: Springer. Google Scholar
  51. Quattoni, A., Collins, M., & Darrell, T. (2004). Conditional random fields for object recognition. In NIPS. Google Scholar
  52. Ramanan, D., Forsyth, D. A., & Zisserman, A. (2007). Tracking people by learning their appearance. IEEE PAMI, 29(1), 65–81. Google Scholar
  53. Rifkin, R., & Klautau, A. (2004). In defense of one-vs-all classification. JMLR, 5, 101–141. MathSciNetGoogle Scholar
  54. Roberts, T., McKenna, S., & Ricketts, I. (2007). Human pose estimation using partial configurations and probabilistic regions. IJCV, 73(3), 285–306. CrossRefGoogle Scholar
  55. Rosenhahn, B., Brox, T., & Weickert, J. (2007). Three-dimensional shape knowledge for joint image segmentation and pose tracking. IJCV, 73(3), 243–262. CrossRefGoogle Scholar
  56. Russell, S. J., & Norvig, P. (2003). Artificial intelligence: a modern approach. Upper Saddle River: Pearson Education. Google Scholar
  57. Schmidt, S., Kappes, J. H., Bergtholdt, M., Pekar, V., Dries, S., Bystrov, D., & Schnörr, C. (2007). Spine detection and labeling using a parts-based graphical model. In N. Karssemeijer & B. Lelieveldt (Eds.), Lect. notes comp. sci. : Vol. 4584. Information processing in medical imaging (pp. 122–133). Berlin: Springer. CrossRefGoogle Scholar
  58. Seemann, E., Leibe, B., & Schiele, B. (2006). Multi-aspect detection of articulated objects. In CVPR. Google Scholar
  59. Sigal, L., & Black, M. (2006a). Measure locally, reason globally: Occlusion-sensitive articulated pose estimation. In CVPR (Vol. 2). Google Scholar
  60. Sigal, L., & Black, M. J. (2006b). Humaneva: Synchronized video and motion capture dataset for evaluation of articulated human motion. Tech. rep. CS-06-08, Brown University, Department of Computer Science, Providence, RI. Google Scholar
  61. Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their locations in images. In ICCV. New York: IEEE. Google Scholar
  62. Sminchisescu, C., Kanaujia, A., & Metaxas, D. (2007). Bm3e: Discriminative density propagation for visual tracking. IEEE PAMI, 29(11), 2030–2044. Google Scholar
  63. Sudderth, E., Ihler, A., Freeman, W., & Willsky, A. (2003). Nonparametric belief propagation. In CVPR. Google Scholar
  64. Sutton, C., McCallum, A., & Rohanimanesh, K. (2007). Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data. JMLR, 8, 693–723. Google Scholar
  65. Szeliski, R., Zabih, R., Scharstein, D., Veksler, O., Kolmogorov, V., Agarwala, A., Tappen, M., & Rother, C. (2006). A comparative study of energy minimization methods for Markov random fields. In ECCV. Google Scholar
  66. Wainwright, M. (2006). Estimating the wrong Markov random field: Benefits in the computation-limited setting. In Y. Weiss, B. Schölkopf, & J. Platt (Eds.), Adv. in neur. inf. proc. sys. (pp. 1425–1432). Cambridge: MIT Press. Google Scholar
  67. Wainwright, M., Jaakola, T., & Willsky, A. (2005). Map estimation via agreement on trees: message-passing and linear programming. IEEE Transactions and Information Theory, 51(11), 3697–3717. CrossRefGoogle Scholar
  68. Weber, M., Welling, M., & Perona, P. (2000). Unsupervised learning of models for recognition. In ECCV (pp. 18–32). Google Scholar
  69. Welk, M., Weickert, J., Becker, F., Schnörr, C., Feddern, C., & Burgeth, B. (2007). Median and related local filters for tensor-valued images. Signal Processing, 87(2), 291–308. CrossRefGoogle Scholar
  70. Werner, T. (2007). A linear programming approach to max-sum problem: A review. IEEE PAMI, 29(7), 1165–1179. Google Scholar
  71. Winkler, G. (2006). Image analysis, random fields and Markov chain Monte Carlo methods. Berlin: Springer. Google Scholar
  72. Yedida, J. S., Freeman, W. T., & Weiss, Y. (2005). Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions and Information Theory, 51(7), 2282–2312. CrossRefGoogle Scholar
  73. Yuille, A., & Coughlan, J. (2000). An A * perspective on deterministic optimization for deformable templates. Pattern Recognition, 33(4), 603–616. CrossRefGoogle Scholar
  74. Zadrozny, B., & Elkan, C. (2002). Transforming classifier scores into accurate multiclass probability estimates. In KDD ’02: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 694–699). New York: ACM. CrossRefGoogle Scholar
  75. Zhang, J., Marszalek, M., Lazebnik, S., & Schmid, C. (2007a). Local features and kernels for classification of texture and object categories: A comprehensive study. IJCV, 73(2), 213–238. CrossRefGoogle Scholar
  76. Zhang, L., Nevatia, R., & Wu, B. (2007b). Detection and tracking of multiple humans with extensive pose articulation. In ICCV. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Martin Bergtholdt
    • 1
  • Jörg Kappes
    • 1
  • Stefan Schmidt
    • 1
  • Christoph Schnörr
    • 1
  1. 1.Dept. Mathematics and Computer ScienceUniversity of HeidelbergHeidelbergGermany

Personalised recommendations