Skip to main content
Log in

Visualizing Object Detection Features

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

We introduce algorithms to visualize feature spaces used by object detectors. Our method works by inverting a visual feature back to multiple natural images. We found that these visualizations allow us to analyze object detection systems in new ways and gain new insight into the detector’s failures. For example, when we visualize the features for high scoring false alarms, we discovered that, although they are clearly wrong in image space, they often look deceptively similar to true positives in feature space. This result suggests that many of these false alarms are caused by our choice of feature space, and supports that creating a better learning algorithm or building bigger datasets is unlikely to correct these errors without improving the features. By visualizing feature spaces, we can gain a more intuitive understanding of recognition systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

Notes

  1. Available online at http://mit.edu/hoggles.

  2. We found a sparse \(\alpha _i\) improves our results. While our method will work when regularizing with \(||\alpha _i||_2\) instead, it tends to produce more blurred images.

  3. We chose Lab because Euclidean distance in this space is known to be perceptually uniform (Jain 1989), which we suspect better matches human interpretation.

References

  • Alahi, A., Ortiz, R., & Vandergheynst. P. (2012). Freak: Fast retina keypoint. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 510–517). IEEE.

  • Biggio, B., Nelson, B., & Laskovl, P. (2012). Poisoning attacks against support vector machines. arXiv preprint arXiv:1206.6389.

  • Bruckner, D. (2014). Ml-o-scope: a diagnostic visualization system for deep machine learning pipelines. Technical report, DTIC Document.

  • Calonder, M., Lepetit, V., Strecha, C., & Fua, P. (2010). Brief: Binary robust independent elementary features. Computer vision—ECCV, 2010 (pp. 778–792).

  • Chen, C. Y., & Grauman, K. (2014). Inferring unseen views of people. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2003–2010).

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE computer society conference on computer vision and pattern recognition 2005 (CVPR 2005) (Vol. 1, pp. 886–893). IEEE.

  • d’Angelo, E., Alahi, A., & Vandergheynst, P. (2012). Beyond bits: Reconstructing images from local binary descriptors. In 2012 21st international conference on pattern recognition (ICPR) (pp. 935–938). IEEE.

  • Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition, 2009, CVPR 2009 (pp. 248–255). IEEE.

  • Divvala, S. K., Efros, A. A., & Hebert, M. (2012). How important are deformable parts in the deformable parts model? In Computer vision—ECCV 2012. Workshops and demonstrations (pp. 31–40). Springer.

  • Dosovitskiy, A., & Brox, T. (2015). Inverting convolutional networks with convolutional networks. arXiv preprint arXiv:1506.02753.

  • Everingham, M., Van Gool, L., Williams, C., Winn, J., & Zisserman. A. (2005). Pascal visual object classes challenge results. Available from http://www.pascal-network.org.

  • Felzenszwalb, P. F., Girshick, R. B., & McAllester, D. (2010a). Cascade object detection with deformable part models. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2241–2248). IEEE.

  • Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010b). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.

    Article  Google Scholar 

  • Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580–587).

  • Gosselin, F., & Schyns, P. G. (2003). Superstitious perceptions reveal properties of internal representations. Psychological Science, 14(5), 505–509.

    Article  Google Scholar 

  • Hariharan, B., Malik, J., & Ramanan, D. (2012). Discriminative decorrelation for clustering and classification. In Computer vision—ECCV 2012 (pp. 459–472). Springer.

  • Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In Computer vision—ECCV 2012 (pp. 340–353). Springer.

  • Huang, D. A., & Wang, Y. C. (2013). Coupled dictionary and feature space learning with applications to cross-domain image synthesis and recognition. In Proceedings of the IEEE international conference on computer vision (pp. 2496–2503).

  • Jain, A. K. (1989). Fundamentals of Digital Image Processing. Upper Saddle River: Prentice-Hall Inc.

    MATH  Google Scholar 

  • Jia, Y. (2013). Caffe: An open source convolutional architecture for fast feature embedding.

  • Kato, H., & Harada, T. (2014). Image reconstruction from bag-of-visual-words. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 955–962).

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).

  • LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.

    Article  Google Scholar 

  • Lenc, K., & Vedaldi, A. (2015). Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 991–999).

  • Liu, L., & Wang, L. (2012). What has my classifier learned? Visualizing the classification rules of bag-of-feature model by support region detection. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3586–3593). IEEE.

  • Lowe, D. G. (1999). Object recognition from local scale-invariant features. In The proceedings of the seventh IEEE international conference on computer vision, 1999 (Vol. 2, pp. 1150–1157). IEEE.

  • Mahendran, A., & Vedaldi, A. (2015). Understanding deep image representations by inverting them. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5188–5196). IEEE.

  • Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2009). Online dictionary learning for sparse coding. In Proceedings of the 26th annual international conference on machine learning (pp. 689–696). ACM.

  • Malisiewicz, T., Gupta, A., & Efros, A.A. (2011). Ensemble of exemplar-SVMs for object detection and beyond. In 2011 IEEE international conference on computer vision (ICCV) (pp. 89–96). IEEE.

  • Nishimoto, S., Vu, A. T., Naselaris, T., Benjamini, Y., Yu, B., & Gallant, J. L. (2011). Reconstructing visual experiences from brain activity evoked by natural movies. Current Biology, 21(19), 1641–1646.

  • Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.

  • Parikh, D., & Zitnick, C. (2011). Human-debugging of machines. In NIPS WCSSWC (Vol. 2, p. 7).

  • Parikh, D., & Zitnick, C. L. (2010). The role of features, algorithms and data in visual recognition. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2328–2335). IEEE.

  • Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.

    Article  MathSciNet  Google Scholar 

  • Sadeghi, M.A., & Forsyth, D. (2013). Fast template evaluation with vector quantization. In Advances in neural information processing systems (pp. 2949–2957).

  • Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034.

  • Tatu, A., Lauze, F., Nielsen, M., & Kimia, B. (2011). Exploring the representation capabilities of the hog descriptor. In 2011 IEEE international conference on computer vision workshops (ICCV Workshops) (pp. 1410–1417). IEEE.

  • Vondrick, C., Khosla, A., Malisiewicz, T., & Torralba. A. (2013). Hoggles: Visualizing object detection features. In Proceedings of the IEEE international conference on computer vision (pp. 1–8).

  • Vondrick, C., Pirsiavash, H., Oliva, A., & Torralba, A. (2015). Learning visual biases from human imagination. In Advances in neural information processing systems (pp. 289–297).

  • Wang, S., Zhang, L., Liang, Y., & Pan, Q. (2012). Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2216–2223). IEEE.

  • Weinzaepfel, P., Jégou, H., & Pérez, P. (2011). Reconstructing an image from its local descriptors. In 2011 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 337–344). IEEE.

  • Yang, J., Wright, J., Huang, T. S., & Ma, Y. (2010). Image super-resolution via sparse representation. IEEE Transactions on Image Processing, 19(11), 2861–2873.

    Article  MathSciNet  Google Scholar 

  • Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer vision—ECCV 2014 (pp. 818–833). Springer.

  • Zhang, L., Dibeklioglu, H., & van der Maaten, L. (2014). Speeding up tracking by ignoring features. In 2014 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1266–1273). IEEE.

  • Zhu, X., Vondrick, C., Ramanan, D., & Fowlkes, C. (2012). Do we need more training data or better models for object detection? In: BMVC (Vol. 3, p. 5). Citeseer.

  • Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

We thank the CSAIL Vision Group for important discussions. Funding was provided by a NSF GRFP and Google Ph.D. Fellowship to CV, a Facebook fellowship to AK, and a Google research award, ONR MURI N000141010933 and NSF Career Award No. 0747120 to AT.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carl Vondrick.

Additional information

Communicated by Derek Hoiem.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vondrick, C., Khosla, A., Pirsiavash, H. et al. Visualizing Object Detection Features. Int J Comput Vis 119, 145–158 (2016). https://doi.org/10.1007/s11263-016-0884-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-016-0884-7

Keywords

Navigation