Multiple Component Learning for Object Detection

  • Piotr Dollár
  • Boris Babenko
  • Serge Belongie
  • Pietro Perona
  • Zhuowen Tu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5303)


Object detection is one of the key problems in computer vision. In the last decade, discriminative learning approaches have proven effective in detecting rigid objects, achieving very low false positives rates. The field has also seen a resurgence of part-based recognition methods, with impressive results on highly articulated, diverse object categories. In this paper we propose a discriminative learning approach for detection that is inspired by part-based recognition approaches. Our method, Multiple Component Learning (mcl), automatically learns individual component classifiers and combines these into an overall classifier. Unlike previous methods, which rely on either fairly restricted part models or labeled part data, mcl learns powerful component classifiers in a weakly supervised manner, where object labels are provided but part labels are not. The basis of mcl lies in learning a set classifier; we achieve this by combining boosting with weakly supervised learning, specifically the Multiple Instance Learning framework (mil). mcl is general, and we demonstrate results on a range of data from computer audition and computer vision. In particular, mcl outperforms all existing methods on the challenging INRIA pedestrian detection dataset, and unlike methods that are not part-based, mcl is quite robust to occlusions.


Training Image Object Detection Pedestrian Detection Multiple Instance Learn Object Label 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Viola, P., Jones, M.: Fast multi-view face detection. In: CVPR (2001)Google Scholar
  2. 2.
    Dalal, N., Triggs, B.: Hist. of oriented gradient for human det. In: CVPR (2005)Google Scholar
  3. 3.
    Tuzel, O., Porikli, F., Meer, P.: Human Detection via Classification on Riemannian Manifolds. In: CVPR (2007)Google Scholar
  4. 4.
    Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 18–32. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  5. 5.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR (2003)Google Scholar
  6. 6.
    Jurie, F., Triggs, B.: Creating efficient codebooks for vis. recog. In: ICCV (2005)Google Scholar
  7. 7.
    Fischler, M., Elschlager, R.: The Representation and Matching of Pictorial Structures. IEEE Transactions on Computers 100(22), 67–92 (1973)CrossRefGoogle Scholar
  8. 8.
    Brunelli, R., Poggio, T.: Face recog.: features vs. templates. PAMI 15(10) (1993)Google Scholar
  9. 9.
    Mohan, A., Papageorgiou, C., Poggio, T.: Example-based object detection in images by components. PAMI 23(4), 349–361 (2001)CrossRefGoogle Scholar
  10. 10.
    Mikolajczyk, K., Schmid, C., Zisserman, A.: Human Detection Based on a Probabilistic Assembly of Robust Part Detectors. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 69–82. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  11. 11.
    Amit, Y., Geman, D.: A computational model for visual selection. Neural Computation 11, 1691–1715 (1999)CrossRefGoogle Scholar
  12. 12.
    Bar-Hillel, A., Hertz, T., Weinshall, D.: Object class recognition by boosting a part-based model. In: CVPR (2005)Google Scholar
  13. 13.
    Vidal-Naquet, M., Ullman, S.: Object recognition with informative features and linear classification. In: ICCV (2003)Google Scholar
  14. 14.
    Agarwal, S., Roth, D.: Learning a sparse repr. for object det. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 113–127. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  15. 15.
    Leibe, B., Leonardis, A., Schiele, B.: Robust Object Detection with Interleaved Categorization and Segmentation. IJCV, 1–31 (2005)Google Scholar
  16. 16.
    Leung, T., Burl, M., Perona, P.: Finding faces in cluttered scenes using random labeled graphmatching. In: ICCV, pp. 637–644 (1995)Google Scholar
  17. 17.
    Crandall, D.J., Huttenlocher, D.P.: Weakly supervised learning of part-based spatial models for visual object recognition. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 16–29. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  18. 18.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR (2008)Google Scholar
  19. 19.
    Viola, P., Platt, J.C., Zhang, C.: Multiple instance boosting for object detection. In: NIPS (2005)Google Scholar
  20. 20.
    Dietterich, T.G., Lathrop, R.H., Lozano-Perez, T.: Solving the multiple-instance problem with axis parallel rectangles. Artificial Intelligence (1997)Google Scholar
  21. 21.
    Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Technical report, Stanford University (1998)Google Scholar
  22. 22.
    Keeler, J.D., Rumelhart, D.E., Leow, W.K.: Integrated segmentation and recognition of hand-printed numerals. In: NIPS (1990)Google Scholar
  23. 23.
    Babenko, B., Dollár, P., Tu, Z., Belongie, S.: Simultaneous learning and alignment: Multi-instance and multi-pose boosting. Technical Report CS2008, UCSD (2008)Google Scholar
  24. 24.
    Reynolds, D., Rose, R.: Robust text-indep. speaker ident. using gaussian mixture speaker models. In: IEEE Trans. on Speech and Audio Processing, pp. 72–83 (1995)Google Scholar
  25. 25.
    Grauman, K., Darrell, T.: Efficient Image Matching with Distributions of Local Invariant Features. In: CVPR (2005)Google Scholar
  26. 26.
    Huttenlocher, D., Klanderman, G., Rucklidge, W.: Comparing images using the Hausdorff distance. PAMI 15(9), 850–863 (1993)CrossRefGoogle Scholar
  27. 27.
    Plamondon, R., Srihari, S.: Online and off-line handwriting recognition: a comprehensive survey. PAMI 22, 63–84 (2000)CrossRefGoogle Scholar
  28. 28.
    Zhang, C., Viola, P.: Multiple-instance pruning for learning efficient cascade detectors. In: NIPS (2007)Google Scholar
  29. 29.
    Sabzmeydani, P., Mori, G.: Det. peds. by learning shapelet ftrs. In: CVPR (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Piotr Dollár
    • 1
    • 2
  • Boris Babenko
    • 2
  • Serge Belongie
    • 1
    • 2
  • Pietro Perona
    • 1
  • Zhuowen Tu
    • 3
  1. 1.Electrical Engineering California Inst. of Tech.USA
  2. 2.Comp. Science & Eng.Univ. of CASan Diego
  3. 3.Lab of Neuro ImagingUniv. of CALos Angeles

Personalised recommendations