Visualizing Object Detection Features

Vondrick, Carl; Khosla, Aditya; Pirsiavash, Hamed; Malisiewicz, Tomasz; Torralba, Antonio

doi:10.1007/s11263-016-0884-7

Visualizing Object Detection Features

Published: 01 March 2016

Volume 119, pages 145–158, (2016)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Carl Vondrick¹,
Aditya Khosla¹,
Hamed Pirsiavash²,
Tomasz Malisiewicz³ &
…
Antonio Torralba¹

1979 Accesses
24 Citations
Explore all metrics

Abstract

We introduce algorithms to visualize feature spaces used by object detectors. Our method works by inverting a visual feature back to multiple natural images. We found that these visualizations allow us to analyze object detection systems in new ways and gain new insight into the detector’s failures. For example, when we visualize the features for high scoring false alarms, we discovered that, although they are clearly wrong in image space, they often look deceptively similar to true positives in feature space. This result suggests that many of these false alarms are caused by our choice of feature space, and supports that creating a better learning algorithm or building bigger datasets is unlikely to correct these errors without improving the features. By visualizing feature spaces, we can gain a more intuitive understanding of recognition systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual Features—From Early Concepts to Modern Computer Vision

Overview: Object Features

Progress in Computer Vision: Object Recognition

Notes

Available online at http://mit.edu/hoggles.
We found a sparse \(\alpha _i\) improves our results. While our method will work when regularizing with \(||\alpha _i||_2\) instead, it tends to produce more blurred images.
We chose Lab because Euclidean distance in this space is known to be perceptually uniform (Jain 1989), which we suspect better matches human interpretation.

References

Alahi, A., Ortiz, R., & Vandergheynst. P. (2012). Freak: Fast retina keypoint. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 510–517). IEEE.
Biggio, B., Nelson, B., & Laskovl, P. (2012). Poisoning attacks against support vector machines. arXiv preprint arXiv:1206.6389.
Bruckner, D. (2014). Ml-o-scope: a diagnostic visualization system for deep machine learning pipelines. Technical report, DTIC Document.
Calonder, M., Lepetit, V., Strecha, C., & Fua, P. (2010). Brief: Binary robust independent elementary features. Computer vision—ECCV, 2010 (pp. 778–792).
Chen, C. Y., & Grauman, K. (2014). Inferring unseen views of people. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2003–2010).
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE computer society conference on computer vision and pattern recognition 2005 (CVPR 2005) (Vol. 1, pp. 886–893). IEEE.
d’Angelo, E., Alahi, A., & Vandergheynst, P. (2012). Beyond bits: Reconstructing images from local binary descriptors. In 2012 21st international conference on pattern recognition (ICPR) (pp. 935–938). IEEE.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition, 2009, CVPR 2009 (pp. 248–255). IEEE.
Divvala, S. K., Efros, A. A., & Hebert, M. (2012). How important are deformable parts in the deformable parts model? In Computer vision—ECCV 2012. Workshops and demonstrations (pp. 31–40). Springer.
Dosovitskiy, A., & Brox, T. (2015). Inverting convolutional networks with convolutional networks. arXiv preprint arXiv:1506.02753.
Everingham, M., Van Gool, L., Williams, C., Winn, J., & Zisserman. A. (2005). Pascal visual object classes challenge results. Available from http://www.pascal-network.org.
Felzenszwalb, P. F., Girshick, R. B., & McAllester, D. (2010a). Cascade object detection with deformable part models. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2241–2248). IEEE.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010b). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.
Article Google Scholar
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580–587).
Gosselin, F., & Schyns, P. G. (2003). Superstitious perceptions reveal properties of internal representations. Psychological Science, 14(5), 505–509.
Article Google Scholar
Hariharan, B., Malik, J., & Ramanan, D. (2012). Discriminative decorrelation for clustering and classification. In Computer vision—ECCV 2012 (pp. 459–472). Springer.
Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In Computer vision—ECCV 2012 (pp. 340–353). Springer.
Huang, D. A., & Wang, Y. C. (2013). Coupled dictionary and feature space learning with applications to cross-domain image synthesis and recognition. In Proceedings of the IEEE international conference on computer vision (pp. 2496–2503).
Jain, A. K. (1989). Fundamentals of Digital Image Processing. Upper Saddle River: Prentice-Hall Inc.
MATH Google Scholar
Jia, Y. (2013). Caffe: An open source convolutional architecture for fast feature embedding.
Kato, H., & Harada, T. (2014). Image reconstruction from bag-of-visual-words. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 955–962).
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Article Google Scholar
Lenc, K., & Vedaldi, A. (2015). Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 991–999).
Liu, L., & Wang, L. (2012). What has my classifier learned? Visualizing the classification rules of bag-of-feature model by support region detection. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3586–3593). IEEE.
Lowe, D. G. (1999). Object recognition from local scale-invariant features. In The proceedings of the seventh IEEE international conference on computer vision, 1999 (Vol. 2, pp. 1150–1157). IEEE.
Mahendran, A., & Vedaldi, A. (2015). Understanding deep image representations by inverting them. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5188–5196). IEEE.
Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2009). Online dictionary learning for sparse coding. In Proceedings of the 26th annual international conference on machine learning (pp. 689–696). ACM.
Malisiewicz, T., Gupta, A., & Efros, A.A. (2011). Ensemble of exemplar-SVMs for object detection and beyond. In 2011 IEEE international conference on computer vision (ICCV) (pp. 89–96). IEEE.
Nishimoto, S., Vu, A. T., Naselaris, T., Benjamini, Y., Yu, B., & Gallant, J. L. (2011). Reconstructing visual experiences from brain activity evoked by natural movies. Current Biology, 21(19), 1641–1646.
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.
Parikh, D., & Zitnick, C. (2011). Human-debugging of machines. In NIPS WCSSWC (Vol. 2, p. 7).
Parikh, D., & Zitnick, C. L. (2010). The role of features, algorithms and data in visual recognition. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2328–2335). IEEE.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Article MathSciNet Google Scholar
Sadeghi, M.A., & Forsyth, D. (2013). Fast template evaluation with vector quantization. In Advances in neural information processing systems (pp. 2949–2957).
Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034.
Tatu, A., Lauze, F., Nielsen, M., & Kimia, B. (2011). Exploring the representation capabilities of the hog descriptor. In 2011 IEEE international conference on computer vision workshops (ICCV Workshops) (pp. 1410–1417). IEEE.
Vondrick, C., Khosla, A., Malisiewicz, T., & Torralba. A. (2013). Hoggles: Visualizing object detection features. In Proceedings of the IEEE international conference on computer vision (pp. 1–8).
Vondrick, C., Pirsiavash, H., Oliva, A., & Torralba, A. (2015). Learning visual biases from human imagination. In Advances in neural information processing systems (pp. 289–297).
Wang, S., Zhang, L., Liang, Y., & Pan, Q. (2012). Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2216–2223). IEEE.
Weinzaepfel, P., Jégou, H., & Pérez, P. (2011). Reconstructing an image from its local descriptors. In 2011 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 337–344). IEEE.
Yang, J., Wright, J., Huang, T. S., & Ma, Y. (2010). Image super-resolution via sparse representation. IEEE Transactions on Image Processing, 19(11), 2861–2873.
Article MathSciNet Google Scholar
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer vision—ECCV 2014 (pp. 818–833). Springer.
Zhang, L., Dibeklioglu, H., & van der Maaten, L. (2014). Speeding up tracking by ignoring features. In 2014 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1266–1273). IEEE.
Zhu, X., Vondrick, C., Ramanan, D., & Fowlkes, C. (2012). Do we need more training data or better models for object detection? In: BMVC (Vol. 3, p. 5). Citeseer.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

We thank the CSAIL Vision Group for important discussions. Funding was provided by a NSF GRFP and Google Ph.D. Fellowship to CV, a Facebook fellowship to AK, and a Google research award, ONR MURI N000141010933 and NSF Career Award No. 0747120 to AT.

Author information

Authors and Affiliations

Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
Carl Vondrick, Aditya Khosla & Antonio Torralba
University of Maryland, Baltimore County, Baltimore, MD, 21250, USA
Hamed Pirsiavash
Vision.ai, Burlington, VT, 05401, USA
Tomasz Malisiewicz

Authors

Carl Vondrick
View author publications
You can also search for this author in PubMed Google Scholar
Aditya Khosla
View author publications
You can also search for this author in PubMed Google Scholar
Hamed Pirsiavash
View author publications
You can also search for this author in PubMed Google Scholar
Tomasz Malisiewicz
View author publications
You can also search for this author in PubMed Google Scholar
Antonio Torralba
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carl Vondrick.

Additional information

Communicated by Derek Hoiem.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vondrick, C., Khosla, A., Pirsiavash, H. et al. Visualizing Object Detection Features. Int J Comput Vis 119, 145–158 (2016). https://doi.org/10.1007/s11263-016-0884-7

Download citation

Received: 20 February 2015
Accepted: 30 January 2016
Published: 01 March 2016
Issue Date: September 2016
DOI: https://doi.org/10.1007/s11263-016-0884-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visualizing Object Detection Features

Abstract

Access this article

Similar content being viewed by others

Visual Features—From Early Concepts to Modern Computer Vision

Overview: Object Features

Progress in Computer Vision: Object Recognition

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Visualizing Object Detection Features

Abstract

Access this article

Similar content being viewed by others

Visual Features—From Early Concepts to Modern Computer Vision

Overview: Object Features

Progress in Computer Vision: Object Recognition

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation