Abstract
We introduce algorithms to visualize feature spaces used by object detectors. Our method works by inverting a visual feature back to multiple natural images. We found that these visualizations allow us to analyze object detection systems in new ways and gain new insight into the detector’s failures. For example, when we visualize the features for high scoring false alarms, we discovered that, although they are clearly wrong in image space, they often look deceptively similar to true positives in feature space. This result suggests that many of these false alarms are caused by our choice of feature space, and supports that creating a better learning algorithm or building bigger datasets is unlikely to correct these errors without improving the features. By visualizing feature spaces, we can gain a more intuitive understanding of recognition systems.
Similar content being viewed by others
Notes
Available online at http://mit.edu/hoggles.
We found a sparse \(\alpha _i\) improves our results. While our method will work when regularizing with \(||\alpha _i||_2\) instead, it tends to produce more blurred images.
We chose Lab because Euclidean distance in this space is known to be perceptually uniform (Jain 1989), which we suspect better matches human interpretation.
References
Alahi, A., Ortiz, R., & Vandergheynst. P. (2012). Freak: Fast retina keypoint. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 510–517). IEEE.
Biggio, B., Nelson, B., & Laskovl, P. (2012). Poisoning attacks against support vector machines. arXiv preprint arXiv:1206.6389.
Bruckner, D. (2014). Ml-o-scope: a diagnostic visualization system for deep machine learning pipelines. Technical report, DTIC Document.
Calonder, M., Lepetit, V., Strecha, C., & Fua, P. (2010). Brief: Binary robust independent elementary features. Computer vision—ECCV, 2010 (pp. 778–792).
Chen, C. Y., & Grauman, K. (2014). Inferring unseen views of people. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2003–2010).
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE computer society conference on computer vision and pattern recognition 2005 (CVPR 2005) (Vol. 1, pp. 886–893). IEEE.
d’Angelo, E., Alahi, A., & Vandergheynst, P. (2012). Beyond bits: Reconstructing images from local binary descriptors. In 2012 21st international conference on pattern recognition (ICPR) (pp. 935–938). IEEE.
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In IEEE conference on computer vision and pattern recognition, 2009, CVPR 2009 (pp. 248–255). IEEE.
Divvala, S. K., Efros, A. A., & Hebert, M. (2012). How important are deformable parts in the deformable parts model? In Computer vision—ECCV 2012. Workshops and demonstrations (pp. 31–40). Springer.
Dosovitskiy, A., & Brox, T. (2015). Inverting convolutional networks with convolutional networks. arXiv preprint arXiv:1506.02753.
Everingham, M., Van Gool, L., Williams, C., Winn, J., & Zisserman. A. (2005). Pascal visual object classes challenge results. Available from http://www.pascal-network.org.
Felzenszwalb, P. F., Girshick, R. B., & McAllester, D. (2010a). Cascade object detection with deformable part models. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2241–2248). IEEE.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010b). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9), 1627–1645.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580–587).
Gosselin, F., & Schyns, P. G. (2003). Superstitious perceptions reveal properties of internal representations. Psychological Science, 14(5), 505–509.
Hariharan, B., Malik, J., & Ramanan, D. (2012). Discriminative decorrelation for clustering and classification. In Computer vision—ECCV 2012 (pp. 459–472). Springer.
Hoiem, D., Chodpathumwan, Y., & Dai, Q. (2012). Diagnosing error in object detectors. In Computer vision—ECCV 2012 (pp. 340–353). Springer.
Huang, D. A., & Wang, Y. C. (2013). Coupled dictionary and feature space learning with applications to cross-domain image synthesis and recognition. In Proceedings of the IEEE international conference on computer vision (pp. 2496–2503).
Jain, A. K. (1989). Fundamentals of Digital Image Processing. Upper Saddle River: Prentice-Hall Inc.
Jia, Y. (2013). Caffe: An open source convolutional architecture for fast feature embedding.
Kato, H., & Harada, T. (2014). Image reconstruction from bag-of-visual-words. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 955–962).
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Lenc, K., & Vedaldi, A. (2015). Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 991–999).
Liu, L., & Wang, L. (2012). What has my classifier learned? Visualizing the classification rules of bag-of-feature model by support region detection. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3586–3593). IEEE.
Lowe, D. G. (1999). Object recognition from local scale-invariant features. In The proceedings of the seventh IEEE international conference on computer vision, 1999 (Vol. 2, pp. 1150–1157). IEEE.
Mahendran, A., & Vedaldi, A. (2015). Understanding deep image representations by inverting them. In 2015 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5188–5196). IEEE.
Mairal, J., Bach, F., Ponce, J., & Sapiro, G. (2009). Online dictionary learning for sparse coding. In Proceedings of the 26th annual international conference on machine learning (pp. 689–696). ACM.
Malisiewicz, T., Gupta, A., & Efros, A.A. (2011). Ensemble of exemplar-SVMs for object detection and beyond. In 2011 IEEE international conference on computer vision (ICCV) (pp. 89–96). IEEE.
Nishimoto, S., Vu, A. T., Naselaris, T., Benjamini, Y., Yu, B., & Gallant, J. L. (2011). Reconstructing visual experiences from brain activity evoked by natural movies. Current Biology, 21(19), 1641–1646.
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42(3), 145–175.
Parikh, D., & Zitnick, C. (2011). Human-debugging of machines. In NIPS WCSSWC (Vol. 2, p. 7).
Parikh, D., & Zitnick, C. L. (2010). The role of features, algorithms and data in visual recognition. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2328–2335). IEEE.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.
Sadeghi, M.A., & Forsyth, D. (2013). Fast template evaluation with vector quantization. In Advances in neural information processing systems (pp. 2949–2957).
Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034.
Tatu, A., Lauze, F., Nielsen, M., & Kimia, B. (2011). Exploring the representation capabilities of the hog descriptor. In 2011 IEEE international conference on computer vision workshops (ICCV Workshops) (pp. 1410–1417). IEEE.
Vondrick, C., Khosla, A., Malisiewicz, T., & Torralba. A. (2013). Hoggles: Visualizing object detection features. In Proceedings of the IEEE international conference on computer vision (pp. 1–8).
Vondrick, C., Pirsiavash, H., Oliva, A., & Torralba, A. (2015). Learning visual biases from human imagination. In Advances in neural information processing systems (pp. 289–297).
Wang, S., Zhang, L., Liang, Y., & Pan, Q. (2012). Semi-coupled dictionary learning with applications to image super-resolution and photo-sketch synthesis. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2216–2223). IEEE.
Weinzaepfel, P., Jégou, H., & Pérez, P. (2011). Reconstructing an image from its local descriptors. In 2011 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 337–344). IEEE.
Yang, J., Wright, J., Huang, T. S., & Ma, Y. (2010). Image super-resolution via sparse representation. IEEE Transactions on Image Processing, 19(11), 2861–2873.
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer vision—ECCV 2014 (pp. 818–833). Springer.
Zhang, L., Dibeklioglu, H., & van der Maaten, L. (2014). Speeding up tracking by ignoring features. In 2014 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1266–1273). IEEE.
Zhu, X., Vondrick, C., Ramanan, D., & Fowlkes, C. (2012). Do we need more training data or better models for object detection? In: BMVC (Vol. 3, p. 5). Citeseer.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320.
Acknowledgments
We thank the CSAIL Vision Group for important discussions. Funding was provided by a NSF GRFP and Google Ph.D. Fellowship to CV, a Facebook fellowship to AK, and a Google research award, ONR MURI N000141010933 and NSF Career Award No. 0747120 to AT.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Derek Hoiem.
Rights and permissions
About this article
Cite this article
Vondrick, C., Khosla, A., Pirsiavash, H. et al. Visualizing Object Detection Features. Int J Comput Vis 119, 145–158 (2016). https://doi.org/10.1007/s11263-016-0884-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-016-0884-7