Advertisement

MVTec D2S: Densely Segmented Supermarket Dataset

  • Patrick Follmann
  • Tobias Böttger
  • Philipp Härtinger
  • Rebecca König
  • Markus Ulrich
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11214)

Abstract

We introduce the Densely Segmented Supermarket (D2S) dataset, a novel benchmark for instance-aware semantic segmentation in an industrial domain. It contains 21 000 high-resolution images with pixel-wise labels of all object instances. The objects comprise groceries and everyday products from 60 categories. The benchmark is designed such that it resembles the real-world setting of an automatic checkout, inventory, or warehouse system. The training images only contain objects of a single class on a homogeneous background, while the validation and test sets are much more complex and diverse. To further benchmark the robustness of instance segmentation methods, the scenes are acquired with different lightings, rotations, and backgrounds. We ensure that there are no ambiguities in the labels and that every instance is labeled comprehensively. The annotations are pixel-precise and allow using crops of single instances for articial data augmentation. The dataset covers several challenges highly relevant in the field, such as a limited amount of training data and a high diversity in the test and validation sets. The evaluation of state-of-the-art object detection and instance segmentation methods on D2S reveals significant room for improvement.

Keywords

Instance segmentation dataset Industrial application 

Notes

Acknowledgements

We want to thank the students Clarissa Siegfarth, Bela Jugel, Thomas Beraneck, Johannes Köhne, Christoph Ziegler and Bernie Stöffler for their help to acquire and annotate the dataset.

Supplementary material

474197_1_En_35_MOESM1_ESM.pdf (13.9 mb)
Supplementary material 1 (pdf 14248 KB)

Supplementary material 2 (mp4 4780 KB)

References

  1. 1.
    Abu-El-Haija, S., et al.: Youtube-8m: a large-scale video classification benchmark. CoRR abs/1609.08675 (2016). https://arxiv.org/abs/1609.08675
  2. 2.
    Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3213–3223 (2016).  https://doi.org/10.1109/CVPR.2016.350
  3. 3.
  4. 4.
    Everingham, M., Eslami, S.M.A., Gool, L.J.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015).  https://doi.org/10.1007/s11263-014-0733-5CrossRefGoogle Scholar
  5. 5.
    Follmann, P., Böttger, T.: A rotationally-invariant convolution module by feature map back-rotation. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 784–792 (2018).  https://doi.org/10.1109/WACV.2018.00091
  6. 6.
    Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Rob. Res. 32(11), 1231–1237 (2013).  https://doi.org/10.1177/0278364913491297CrossRefGoogle Scholar
  7. 7.
    George, M., Floerkemeier, C.: Recognizing products: a per-exemplar multi-label image classification approach. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 440–455. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10605-2_29CrossRefGoogle Scholar
  8. 8.
    Gurumurthy, S., Kiran Sarvadevabhatla, R., Venkatesh Babu, R.: DeLiGAN: generative adversarial networks for diverse and limited data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 166–174 (2017).  https://doi.org/10.1109/CVPR.2017.525
  9. 9.
    He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: IEEE International Conference on Computer Vision (ICCV), pp. 1059–1067 (2017).  https://doi.org/10.1109/ICCV.2017.322
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016).  https://doi.org/10.1109/CVPR.2016.90
  11. 11.
    Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., Belongie, S.: Stacked generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5077–5086 (2017).  https://doi.org/10.1109/CVPR.2017.202
  12. 12.
    ITAB: HyperFLOW. https://itab.com/en/itab/checkout/self-checkouts/. Accessed 7 Mar 2018
  13. 13.
    Jund, P., Abdo, N., Eitel, A., Burgard, W.: The freiburg groceries dataset. CoRR abs/1611.05799 (2016). https://arxiv.org/abs/1611.05799
  14. 14.
    Koubaroulis, D., Matas, J., Kittler, J.: Evaluating colour-based object recognition algorithms using the SOIL-47 database. In: Asian Conference on Computer Vision, p. 2 (2002)Google Scholar
  15. 15.
    Lai, K., Bo, L., Fox, D.: Unsupervised feature learning for 3D scene labeling. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 3050–3057. IEEE (2014).  https://doi.org/10.1109/ICRA.2014.6907298
  16. 16.
    Lai, K., Bo, L., Ren, X., Fox, D.: A large-scale hierarchical multi-view RGB-D object dataset. In: 2011 IEEE International Conference on Robotics and Automation (ICRA), pp. 1817–1824. IEEE (2011).  https://doi.org/10.1109/ICRA.2011.5980382
  17. 17.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015).  https://doi.org/10.1038/nature14539CrossRefGoogle Scholar
  18. 18.
    Li, J., Liang, X., Wei, Y., Xu, T., Feng, J., Yan, S.: Perceptual generative adversarial networks for small object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1222–1230 (2017).  https://doi.org/10.1109/CVPR.2017.211
  19. 19.
    Li, Y., Qi, H., Da, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2359–2367 (2017).  https://doi.org/10.1109/CVPR.2017.472
  20. 20.
    Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017).  https://doi.org/10.1109/CVPR.2017.106
  21. 21.
    Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: IEEE International Conference on Computer Vision (ICCV) (2017).  https://doi.org/10.1109/ICCV.2017.324
  22. 22.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  23. 23.
    Merler, M., Galleguillos, C., Belongie, S.: Recognizing groceries in situ using in vitro training data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2007).  https://doi.org/10.1109/CVPR.2007.383486
  24. 24.
    Minervini, M., Fischbach, A., Scharr, H., Tsaftaris, S.A.: Finely-grained annotated datasets for image-based plant phenotyping. Pattern Recognit. Lett. 81, 80–89 (2016).  https://doi.org/10.1016/j.patrec.2015.10.013CrossRefGoogle Scholar
  25. 25.
    Neuhold, G., Ollmann, T., Rota Bulo, S., Kontschieder, P.: The mapillary vistas dataset for semantic understanding of street scenes. In: IEEE International Conference on Computer Vision (ICCV), pp. 4990–4999 (2017).  https://doi.org/10.1109/ICCV.2017.534
  26. 26.
    Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017).  https://doi.org/10.1109/CVPR.2017.690
  27. 27.
    Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 62–66 (2017).  https://doi.org/10.1109/TPAMI.2016.2577031CrossRefGoogle Scholar
  28. 28.
    Rennie, C., Shome, R., Bekris, K.E., De Souza, A.F.: A dataset for improved RGBD-based object detection and pose estimation for warehouse pick-and-place. IEEE Rob. Autom. Lett. 1(2), 1179–1185 (2016).  https://doi.org/10.1109/LRA.2016.2532924CrossRefGoogle Scholar
  29. 29.
    Richtsfeld, A., Mörwald, T., Prankl, J., Zillich, M., Vincze, M.: Segmentation of unknown objects in indoor environments. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4791–4796. IEEE (2012).  https://doi.org/10.1109/IROS.2012.6385661
  30. 30.
    Rocha, A., Hauagge, D.C., Wainer, J., Goldenstein, S.: Automatic fruit and vegetable classification from images. Comput. Electron. Agric. 70(1), 96–104 (2010).  https://doi.org/10.1016/j.compag.2009.09.002CrossRefGoogle Scholar
  31. 31.
    Russakovsky, O., et al.: ImageNetlarge scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015).  https://doi.org/10.1007/s11263-015-0816-yMathSciNetCrossRefGoogle Scholar
  32. 32.
    Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39, 640–651 (2015).  https://doi.org/10.1109/TPAMI.2016.2572683CrossRefGoogle Scholar
  33. 33.
    Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 761–769 (2016).  https://doi.org/10.1109/CVPR.2016.89
  34. 34.
    Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2107–2116 (2017).  https://doi.org/10.1109/CVPR.2017.241
  35. 35.
    Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: IEEE International Conference on Computer Vision (ICCV), pp. 5907–5915 (2017).  https://doi.org/10.1109/ICCV.2017.629
  36. 36.
    Zhou, B., Khosla, A., Lapedriza, À., Torralba, A., Oliva, A.: Places: an image database for deep scene understanding. CoRR abs/1610.02055 (2016). http://arxiv.org/abs/1610.02055
  37. 37.
    Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Semantic understanding of scenes through the ADE20K dataset. CoRR abs/1608.05442 (2016). http://arxiv.org/abs/1608.05442
  38. 38.
    Zhou, Y., Ye, Q., Qiu, Q., Jiao, J.: Oriented response networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4961–4970 (2017).  https://doi.org/10.1109/CVPR.2017.527

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.MVTec Software GmbHMunichGermany
  2. 2.Technical University of MunichMunichGermany

Personalised recommendations