Instance Segmentation and Tracking with Cosine Embeddings and Recurrent Hourglass Networks

  • Christian PayerEmail author
  • Darko Štern
  • Thomas Neff
  • Horst Bischof
  • Martin Urschler
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11071)


Different to semantic segmentation, instance segmentation assigns unique labels to each individual instance of the same class. In this work, we propose a novel recurrent fully convolutional network architecture for tracking such instance segmentations over time. The network architecture incorporates convolutional gated recurrent units (ConvGRU) into a stacked hourglass network to utilize temporal video information. Furthermore, we train the network with a novel embedding loss based on cosine similarities, such that the network predicts unique embeddings for every instance throughout videos. Afterwards, these embeddings are clustered among subsequent video frames to create the final tracked instance segmentations. We evaluate the recurrent hourglass network by segmenting left ventricles in MR videos of the heart, where it outperforms a network that does not incorporate video information. Furthermore, we show applicability of the cosine embedding loss for segmenting leaf instances on still images of plants. Finally, we evaluate the framework for instance segmentation and tracking on six datasets of the ISBI celltracking challenge, where it shows state-of-the-art performance.


Cell Tracking Segmentation Instances Recurrent Video Embeddings 


  1. 1.
    Appel, K., Haken, W.: Every planar map is four colorable. Bull. Am. Math. Soc. 82(5), 711–712 (1976)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Ballas, N., Yao, L., Pal, C., Courville, A.: Delving deeper into convolutional networks for learning video representations. In: International Conference on Learning Representations. CoRR, abs:1511.06432 (2016)Google Scholar
  3. 3.
    Campello, R.J.G.B., Moulavi, D., Zimek, A., Sander, J.: Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Trans. Knowl. Discov. Data 10(1), 5:1–5:51 (2015)CrossRefGoogle Scholar
  4. 4.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the International Conference on Computer Vision, pp. 2980–2988 (2017)Google Scholar
  5. 5.
    Kong, S., Fowlkes, C.: Recurrent pixel embedding for instance grouping. CoRR, abs:1712.08273 (2017)Google Scholar
  6. 6.
    Maška, M., Ulman, V., Svoboda, D., Matula, P., Matula, P., Ederra, C., et al.: A benchmark for comparison of cell tracking algorithms. Bioinformatics 30(11), 1609–1617 (2014)CrossRefGoogle Scholar
  7. 7.
    Minervini, M., Fischbach, A., Scharr, H., Tsaftaris, S.A.: Finely-grained annotated datasets for image-based plant phenotyping. Pattern Recogn. Lett. 81, 80–89 (2016)CrossRefGoogle Scholar
  8. 8.
    Newell, A., Huang, Z., Deng, J.: Associative embedding: end-to-end learning for joint detection and grouping. In: Advances in Neural Information Processing Systems, pp. 2277–2287. Curran Associates, Inc. (2017)Google Scholar
  9. 9.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). Scholar
  10. 10.
    Payer, C., Štern, D., Bischof, H., Urschler, M.: Multi-label whole heart segmentation using CNNs and anatomical label configurations. In: Pop, M., et al. (eds.) STACOM 2017. LNCS, vol. 10663, pp. 190–198. Springer, Cham (2018). Scholar
  11. 11.
    Ren, M., Zemel, R.S.: End-to-end instance segmentation with recurrent attention. In: Proceedings of the Computer Vision and Pattern Recognition, pp. 6656–6664 (2017)Google Scholar
  12. 12.
    Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). Scholar
  13. 13.
    Scharr, H., et al.: Leaf segmentation in plant phenotyping: a collation study. Mach. Vis. Appl. 27(4), 585–606 (2016)CrossRefGoogle Scholar
  14. 14.
    Suinesiaputra, A., Cowan, B.R., Al-Agamy, A.O., Elattar, M.A., Ayache, N., et al.: A collaborative resource to build consensus for automated left ventricular segmentation of cardiac MR images. Med. Image Anal. 18(1), 50–62 (2014)CrossRefGoogle Scholar
  15. 15.
    Ulman, V., Maška, M., Magnusson, K.E., Ronneberger, O., Haubold, C., Harder, N.: An objective comparison of cell-tracking algorithms. Nat. Methods 14(12), 1141–1152 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Institute of Computer Graphics and VisionGraz University of TechnologyGrazAustria
  2. 2.Ludwig Boltzmann Institute for Clinical Forensic ImagingGrazAustria
  3. 3.BioTechMed-GrazGrazAustria

Personalised recommendations