Object Boundary Detection and Classification with Image-Level Labels

  • Jing Yu Koh
  • Wojciech Samek
  • Klaus-Robert Müller
  • Alexander BinderEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10496)


Semantic boundary and edge detection aims at simultaneously detecting object edge pixels in images and assigning class labels to them. Systematic training of predictors for this task requires the labeling of edges in images which is a particularly tedious task. We propose a novel strategy for solving this task, when pixel-level annotations are not available, performing it in an almost zero-shot manner by relying on conventional whole image neural net classifiers that were trained using large bounding boxes. Our method performs the following two steps at test time. Firstly it predicts the class labels by applying the trained whole image network to the test images. Secondly, it computes pixel-wise scores from the obtained predictions by applying backprop gradients as well as recent visualization algorithms such as deconvolution and layer-wise relevance propagation. We show that high pixel-wise scores are indicative for the location of semantic boundaries, which suggests that the semantic boundary problem can be approached without using edge labels during the training phase.


  1. 1.
    Arras, L., Horn, F., Montavon, G., Müller, K.R., Samek, W.: Explaining predictions of non-linear classifiers in NLP. In: Proceedings of the 1st Workshop on Representation Learning for NLP, pp. 1–7. Association for Computational Linguistics (2016)Google Scholar
  2. 2.
    Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PloS one 10(7), e0130140 (2015)CrossRefGoogle Scholar
  3. 3.
    Bertasius, G., Shi, J., Torresani, L.: High-for-low and low-for-high: efficient boundary detection from deep object features and its applications to high-level vision. In: IEEE ICCV, pp. 504–512 (2015)Google Scholar
  4. 4.
    Everingham, M., Van Gool, L., Williams, C., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. IJCV 88(2), 303–338 (2010)CrossRefGoogle Scholar
  5. 5.
    Hariharan, B., Arbeláez, P., Bourdev, L.D., Maji, S., Malik, J.: Semantic contours from inverse detectors. In: IEEE ICCV, pp. 991–998 (2011)Google Scholar
  6. 6.
    Hariharan, B., Arbeláez, P., Girshick, R.B., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: IEEE CVPR, pp. 447–456 (2015)Google Scholar
  7. 7.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R.B., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the ACM International Conference on Multimedia, pp. 675–678 (2014)Google Scholar
  8. 8.
    Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Li, F.: Large-scale video classification with convolutional neural networks. In: IEEE CVPR, pp. 1725–1732 (2014)Google Scholar
  9. 9.
    Khoreva, A., Benenson, R., Omran, M., Hein, M., Schiele, B.: Weakly supervised object boundaries. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016Google Scholar
  10. 10.
    Koutník, J., Cuccu, G., Schmidhuber, J., Gomez, F.J.: Evolving large-scale neural networks for vision-based reinforcement learning. In: GECCO, pp. 1061–1068 (2013)Google Scholar
  11. 11.
    Lapuschkin, S., Binder, A., Montavon, G., Müller, K.R., Samek, W.: Analyzing classifiers: fisher vectors and deep neural networks. In: IEEE CVPR, pp. 2912–2920 (2016)Google Scholar
  12. 12.
    Lapuschkin, S., Binder, A., Montavon, G., Müller, K.R., Samek, W.: The layer-wise relevance propagation toolbox for artificial neural networks. J. Mach. Learn. Res. 17(114), 1–5 (2016)zbMATHGoogle Scholar
  13. 13.
    Li, Y., Paluri, M., Rehg, J.M., Dollar, P.: Unsupervised learning of edges. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016Google Scholar
  14. 14.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE CVPR, pp. 3431–3440 (2015)Google Scholar
  15. 15.
    Malinowski, M., Rohrbach, M., Fritz, M.: Ask your neurons: a neural-based approach to answering questions about images. In: IEEE ICCV, pp. 1–9 (2015)Google Scholar
  16. 16.
    Maninis, K.K., Pont-Tuset, J., Arbelaez, P., Gool, L.V.: Convolutional oriented boundaries: from image segmentation to high-level tasks. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1–1 (2017)CrossRefGoogle Scholar
  17. 17.
    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  18. 18.
    Montavon, G., Bach, S., Binder, A., Samek, W., Müller, K.R.: Explaining nonlinear classification decisions with deep taylor decomposition. Pattern Recognit. 65, 211–222 (2017)CrossRefGoogle Scholar
  19. 19.
    Samek, W., Binder, A., Montavon, G., Lapuschkin, S., Müller, K.R.: Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. (2016)Google Scholar
  20. 20.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. CoRR abs/1312.6034 (2013)Google Scholar
  21. 21.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in NIPS, pp. 3104–3112 (2014)Google Scholar
  22. 22.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. CoRR abs/1409.4842 (2014)Google Scholar
  23. 23.
    Xie, S., Tu, Z.: Holistically-nested edge detection. Int. J. Comput. Vis. (2017).
  24. 24.
    Yang, J., Price, B., Cohen, S., Lee, H., Yang, M.H.: Object contour detection with a fully convolutional encoder-decoder network. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016Google Scholar
  25. 25.
    Yu, Z., Feng, C., Liu, M.Y., Ramalingam, S.: CASENet: Deep Category-Aware Semantic Edge Detection. ArXiv e-prints, May 2017Google Scholar
  26. 26.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). doi: 10.1007/978-3-319-10590-1_53 Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Jing Yu Koh
    • 1
  • Wojciech Samek
    • 2
  • Klaus-Robert Müller
    • 3
    • 4
  • Alexander Binder
    • 1
    Email author
  1. 1.ISTD PillarSingapore University of Technology and DesignSingaporeSingapore
  2. 2.Department of Video Coding and AnalyticsFraunhofer Heinrich Hertz InstituteBerlinGermany
  3. 3.Department of Computer ScienceTU BerlinBerlinGermany
  4. 4.Department of Brain and Cognitive EngineeringKorea UniversitySeoulRepublic of Korea

Personalised recommendations