Sparselet Models for Efficient Multiclass Object Detection

  • Hyun Oh Song
  • Stefan Zickler
  • Tim Althoff
  • Ross Girshick
  • Mario Fritz
  • Christopher Geyer
  • Pedro Felzenszwalb
  • Trevor Darrell
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7573)


We develop an intermediate representation for deformable part models and show that this representation has favorable performance characteristics for multi-class problems when the number of classes is high. Our model uses sparse coding of part filters to represent each filter as a sparse linear combination of shared dictionary elements. This leads to a universal set of parts that are shared among all object classes. Reconstruction of the original part filter responses via sparse matrix-vector product reduces computation relative to conventional part filter convolutions. Our model is well suited to a parallel implementation, and we report a new GPU DPM implementation that takes advantage of sparse coding of part filters. The speed-up offered by our intermediate representation and parallel computation enable real-time DPM detection of 20 different object classes on a laptop computer.


Sparse Coding Object Detection Deformable Part Models 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D.A.: Cascade object detection with deformable part models. In: CVPR (2010)Google Scholar
  2. 2.
    Pedersoli, M., Vedaldi, A., Gonzàlez, J.: A coarse-to-fine approach for fast deformable object detection. In: CVPR (2011)Google Scholar
  3. 3.
    Ott, P., Everingham, M.: Shared parts for deformable part-based models. In: CVPR, pp. 1513–1520 (2011)Google Scholar
  4. 4.
    Pirsiavash, H., Ramanan, D., Fowlkes, C.: Bilinear classifiers for visual recognition. In: NIPS (2009)Google Scholar
  5. 5.
    Quattoni, A., Collins, M., Darrell, T.: Transfer learning for image classification with sparse prototype representations. In: CVPR (2008)Google Scholar
  6. 6.
    Fritz, M., Schiele, B.: Decomposition, discovery and detection of visual categories using topic models. In: CVPR (2008)Google Scholar
  7. 7.
    Griffin, G., Perona, P.: Learning and using taxonomies for fast visual categorization. In: CVPR (2008)Google Scholar
  8. 8.
    Bengio, S., Weston, J., Grangier, D.: Label embedding trees for large multi-class tasks. In: NIPS (2010)Google Scholar
  9. 9.
    Binder, A., Müller, K.R., Kawanabe, M.: On taxonomies for multi-class image categorization. International Journal of Computer Vision 99(3), 281–301 (2012)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Lai, K., Bo, L., Ren, X., Fox, D.: A scalable tree-based approach for joint object and pose recognition. In: Twenty-Fifth Conference on Artificial Intelligence (AAAI) (August 2011)Google Scholar
  11. 11.
    Razavi, N., Gall, J., Gool, L.J.V.: Scalable multi-class object detection. In: CVPR, pp. 1505–1512 (2011)Google Scholar
  12. 12.
    Marszałek, M., Schmid, C.: Constructing Category Hierarchies for Visual Recognition. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 479–491. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  13. 13.
    Gao, T., Koller, D.: Discriminative learning of relaxed hierarchy for large-scale visual recognition. In: ICCV (2011)Google Scholar
  14. 14.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  15. 15.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  16. 16.
    Cotter, S.F., Rao, B.D., Kreutz-Delgado, K., Adler, J.: Forward sequential algorithms for best basis selection. IEEE Proceedings Vision Image and Signal Processing 146(5), 235 (1999)CrossRefGoogle Scholar
  17. 17.
    Mallat, S.G., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing 41(12), 3397–3415 (1993)zbMATHCrossRefGoogle Scholar
  18. 18.
    Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research 11, 19–60 (2010)zbMATHMathSciNetGoogle Scholar
  19. 19.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge (VOC 2007) Results (2007),
  20. 20.
    NVIDIA: CUDA Technology,
  21. 21.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A Large-Scale Hierarchical Image Database. In: CVPR (2009)Google Scholar
  22. 22.
    Smeaton, A.F., Over, P., Kraaij, W.: Evaluation campaigns and trecvid. In: MIR 2006: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, pp. 321–330. ACM Press, New York (2006)Google Scholar
  23. 23.
    Amazon Mechanical Turk,
  24. 24.
    Song, H.O., Fritz, M., Althoff, T., Darrell, T.: Don’t look back: Post-hoc category detection via sparse reconstruction. Technical Report UCB/EECS-2012-16, EECS Department, University of California, Berkeley (January 2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Hyun Oh Song
    • 1
  • Stefan Zickler
    • 2
  • Tim Althoff
    • 1
  • Ross Girshick
    • 3
  • Mario Fritz
    • 4
  • Christopher Geyer
    • 2
  • Pedro Felzenszwalb
    • 5
  • Trevor Darrell
    • 1
  1. 1.UC BerkeleyUSA
  2. 2.iRobotUSA
  3. 3.University of ChicagoUSA
  4. 4.Max Planck Institute for InformaticsGermany
  5. 5.Brown UniversityUSA

Personalised recommendations