Skip to main content

Intelligent and Interactive Video Annotation for Instance Segmentation Using Siamese Neural Networks

  • 2027 Accesses

Part of the Lecture Notes in Computer Science book series (LNIP,volume 12664)


Training machine learning models in a supervised manner requires vast amounts of labeled data. These labels are typically provided by humans manually annotating samples using a variety of tools. In this work, we propose an intelligent annotation tool to combine the fast and efficient labeling capabilities of modern machine learning models with the reliable and accurate, but slow, correction capabilities of human annotators. We present our approach to interactively condition a model on previously predicted and manually annotated or corrected instances and explore an iterative workflow combining the advantages of the intelligent model and the human annotator for the task of instance segmentation in videos. Thereby, the intelligent model conducts the bulk of the work, performing instance detection, tracking, and segmentation, and enables the human annotator to correct individual frames and instances selectively. The proposed approach avoids the computational cost of online retraining by being based on the one-shot learning paradigm. For this purpose, we use Siamese neural networks to transfer annotations from one video frame to another. Multiple interaction options regarding the choice of the additional input data to the neural network, e.g., model predictions or manual corrections, are explored to refine the given model’s labeling performance and speed up the annotation process.


  • Instance detection
  • Tracking
  • Segmentation
  • Semi-supervised
  • Labeling
  • Video annotation
  • Object detection
  • One-shot learning

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. 1.


  1. Acuna, D., Ling, H., Kar, A., Fidler, S.: Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 859–868 (2018)

    Google Scholar 

  2. Asano, Y.M., Rupprecht, C., Vedaldi, A.: A critical analysis of self-supervision, or what we can learn from a single image. In: ICLR, pp. 1–16. Vienna, Austria (2020)

    Google Scholar 

  3. Bianco, S., Ciocca, G., Napoletano, P., Schettini, R.: An interactive tool for manual, semi-automatic and automatic video annotation. CVIU 131, 88–99 (2015)

    Google Scholar 

  4. Castrejón, L., Kundu, K., Urtasun, R., Fidler, S.: Annotating object instances with a polygon-RNN. In: CVPR, pp. 4485–4493. Honolulu, HI, USA (2017)

    Google Scholar 

  5. Fagot-Bouquet, L., Rabarisoa, J., Pham, Q.: Fast and accurate video annotation using dense motion hypotheses. In: ICIP, pp. 3122–3126. Paris, France (2014)

    Google Scholar 

  6. Falk, T., et al.: U-Net: deep learning for cell counting, detection, and morphometry. Nature Methods 16, 67–70 (2018)

    CrossRef  Google Scholar 

  7. Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. TPAMI 28(4), 594–611 (2006)

    CrossRef  Google Scholar 

  8. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: ICLR, pp. 1–26. Vancouver, BC, Canada (2017)

    Google Scholar 

  9. Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: ICML, pp. 1–8. Lille, France (2015)

    Google Scholar 

  10. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: ICLR, pp. 1–19. New Orleans, LA, USA (2019)

    Google Scholar 

  11. Nagaraja, N., Schmidt, F.R., Brox, T.: Video segmentation with just a few strokes. In: ICCV, pp. 3235–3243. Santiago, Chile (2015)

    Google Scholar 

  12. Perazzi, F., et al.: A benchmark dataset and evaluation methodology for video object segmentation. In: CVPR, pp. 724–732. Las Vegas, NV, USA (2016)

    Google Scholar 

  13. Pont-Tuset, J., Perazzi, F., Caelles, S., Arbeláez, P., Sorkine-Hornung, A., Van Gool, L.: The 2017 DAVIS challenge on video object segmentation. arXiv:1704.00675 (2017)

  14. Subramanian, A., Subramanian, A.: One-click annotation with guided hierarchical object detection. arXiv:1810.00609 (2018)

  15. Vicente, S., Rother, C., Kolmogorov, V.: Object cosegmentation. In: CVPR, pp. 2217–2224. Colorado Springs, CO, USA (2011)

    Google Scholar 

  16. Vondrick, C., Ramanan, D.: Video annotation and tracking with active learning. Adv. Neural Inf. Process. Syst. 24, 28–36 (2011)

    Google Scholar 

  17. Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.: Fast online object tracking and segmentation: a unifying approach. In: CVPR, pp. 1328–1338. Salt Lake City, UT, USA (2018)

    Google Scholar 

  18. Wang, Y., Yao, Q., Kwok, J., Ni, L.M.: Generalizing from a few examples: a survey on few-shot learning. ACM Comput. Surv. 53, 1–34 (2019)

    Google Scholar 

Download references


This work results from the project KI Data Tooling (19A20001O) funded by BMWI (German Federal Ministry for Economic Affairs and Energy), and the project DeCoInt\(^2\) supported by the German Research Foundation (DFG) within the priority program SPP 1835: “Kooperativ interagierende Automobile”, grant number SI 674/11-2.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Jan Schneegans .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Schneegans, J., Bieshaar, M., Heidecker, F., Sick, B. (2021). Intelligent and Interactive Video Annotation for Instance Segmentation Using Siamese Neural Networks. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12664. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-68798-4

  • Online ISBN: 978-3-030-68799-1

  • eBook Packages: Computer ScienceComputer Science (R0)