Video object segmentation for automatic image annotation of ethernet connectors with environment mapping and 3D projection

Danta, Marrone; Dreyer, Pedro; Bezerra, Daniel; Reis, Gabriel; Souza, Ricardo; Lins, Silvia; Kelner, Judith; Sadok, Djamel

doi:10.1007/s11042-022-13128-z

Video object segmentation for automatic image annotation of ethernet connectors with environment mapping and 3D projection

Published: 04 May 2022

Volume 81, pages 39891–39913, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Marrone Danta ORCID: orcid.org/0000-0002-7927-8472¹,
Pedro Dreyer¹,
Daniel Bezerra¹,
Gabriel Reis¹,
Ricardo Souza²,
Silvia Lins²,
Judith Kelner¹ &
…
Djamel Sadok¹

334 Accesses
1 Altmetric
Explore all metrics

Abstract

The creation of a dataset is time-consuming and sometimes discourages researchers from pursuing their goals. To overcome this problem, we present and discuss two solutions adopted for the automation of this process. Both optimize valuable user time and resources and use video object segmentation with object tracking and 3D projection. In our scenario, we acquire images from a moving robotic arm and, for each approach, generate distinct annotated datasets. We evaluated the precision of the annotations by comparing these with a manually annotated dataset. As a complementary test to assess the quality of the generated datasets and to achieve a generalization of our contribution, we tested detection and classification problems. In both tests, we rely on solutions with Convolution Neural Network and Deep Learning. For detection support, we used YOLO and obtained for the projection dataset an F1-Score, accuracy, and mAP values of 0.846, 0.924, and 0.875, respectively. Concerning the tracking dataset, we achieved an F1-Score of 0.861, an accuracy of 0.932, whereas mAP reached 0.894. For the classification, we adopted the two metrics accuracy and F1-Score, and used the known networks VGG, DenseNet, MobileNet, Inception, and ResNet. The VGG architecture outperformed the others for both projection and tracking datasets. It reached an accuracy and F1-score of 0.997 and 0.993, respectively. Similarly, for the tracking dataset, it achieved an accuracy of 0.991 and an F1-Score of 0.981.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multitask Learning for Extensive Object Description to Improve Scene Understanding on Monocular Video

Automatic object annotation in streamed and remotely explored large 3D reconstructions

Article Open access 07 January 2021

Visual Positioning of Distant Wall-Climbing Robots Using Convolutional Neural Networks

Article 27 March 2020

References

Adibhatla VA, Chih H-C, Hsu C-C, Cheng J, Abbod MF, Shieh J-S (2020) Defect detection in printed circuit boards using you-only-look-once convolutional neural networks. Electronics, 9(9)
Akhilesh K, Sedamkar RR (2016) Automatic image annotation using an ant colony optimization algorithm (aco). In: 2016 IEEE 7th power India international conference (PIICON), pp 1–4
Alham NK, Li M, Liu Y, Hammoud S, Ponraj M (2011) A distributed svm for scalable image annotation. In: 2011 Eighth international conference on fuzzy systems and knowledge discovery (FSKD), vol 4, pp 2655–2658
Berg A, Johnander J, Durand de Gevigney F, Ahlberg J, Felberg M (2019) Semi-automatic annotation of objects in visual-thermal video. In: 2019 IEEE/CVF International conference on computer vision workshop (ICCVW), pp 2242–2251
Chu G, Niu K, Tian B (2014) Automatic image annotation combining svms and knn algorithm. In: 2014 IEEE 3Rd international conference on cloud computing and intelligence systems, pp 13–17
Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, pp 248–255
Ding S, Wu J, Xu W, Chao H (2016) Automatically building face datasets of new domains from weakly labeled data with pretrained models. arXiv:1611.08107
Ding S, Wu J, Xu W, Chao H (2016) Automatically building face datasets of new domains from weakly labeled data with pretrained models. arXiv:1611.08107
Dube P, Bhattacharjee B, Huo S, Watson P, Belgodere B (2019) Automatic labeling of data for transfer learning. In: The IEEE conference on computer vision and pattern recognition (CVPR) Workshops
Duda A, Frese U (2018) Accurate detection and localization of checkerboard corners for calibration. In: BMVC
Esteva A, Chou K, Yeung S, Naik N, Madani A, Mottaghi A, Liu Y, Topol E, Dean J, Socher R (2021) Deep learning-enabled medical computer vision. npj Digital Medicine 4(1):5
Article Google Scholar
Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88 (2):303–338
Article Google Scholar
Faktor A, Irani M (2014) Video segmentation by non-local consensus voting. BMVC 2014 - Proceedings of the British machine vision conference 2014:01
Google Scholar
Fischler MA, Bolles RC (1981) Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395
Article MathSciNet Google Scholar
Gonzalez-Diaz R, Paluzo-Hidalgo E, Gutiérrez-Naranjo MA (2018) Representative datasets for neural networks. Electron Notes Discrete Math 68:89–94
Article Google Scholar
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
MATH Google Scholar
Hartley R, Zisserman A (2003) Multiple view geometry in computer vision, 2nd edn. Cambridge University Press, New York
MATH Google Scholar
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: 2017 IEEE International conference on computer vision (ICCV), pp 2980–2988
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 770–778
Howard AG, Zhu M, Bo C, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Huang G, Liu Z, Weinberger KQ (2016) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269
Iwendi C, Khan S, Anajemba JH, Mittal M, Alenezi M, Alazab M (2020) The use of ensemble models for multiple class and binary class classification for improving intrusion detection systems. Sensors, 20(9)
Iwendi C, Srivastava G, Khan S, Maddikunta PKR (2020) Cyberbullying detection solutions based on deep learning architectures. Multimedia Systems
Jin Y, Li J, Ma D, Guo X, Yu H (2017) A semi-automatic annotation technology for traffic scene image labeling based on deep learning preprocessing. In: 2017 IEEE International conference on computational science and engineering, vol 1, pp 315–320
Kamilaris A, Prenafeta-Boldú FX (2018) Deep learning in agriculture A survey. Computers and electronics in agriculture 147:70–90
Article Google Scholar
Kurilkin A, Ivanov S (2016) A comparison of methods to detect people flow using video processing. Procedia Comput Sci 101:125–134, 12
Article Google Scholar
Lee S, Kwak S, Cho M (2019) Universal bounding box regression and its applications
Li Y, Guo P, Xin X (2016) A divide and conquer method for automatic image annotation. In: 2016 12Th international conference on computational intelligence and security (CIS), pp 660–664
Li Y, Guo P, Xin X (2017) A divide and conquer method for automatic image annotation. In: Proceedings - 12th international conference on computational intelligence and security, CIS 2016. Institute of Electrical and Electronics Engineers Inc, pp 660–664
Li S, Seybold B, Vorobyov A, Fathi A, Huang Q, Jay Kuo C-C (2018) Instance embedding transfer to unsupervised video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Liu S, Niles-Weed J, Razavian N, Fernandez-Granda C (2020) Early-learning regularization prevents memorization of noisy labels. In: Advances in neural information processing systems, 33
Lukežič A, Vojíř T, Zajc LČ, Matas J, Kristan M (2016) Discriminative correlation filter with channel and spatial reliability. Int J Comput Vis 126:11
MathSciNet Google Scholar
Manaf SA, Nordin MJ (2009) Review on statistical approaches for automatic image annotation. In: 2009 International conference on electrical engineering and informatics, vol 01, pp 56–61
O’Mahony N, Campbell S, Carvalho A, Harapanahalli S, Hernandez GV, Krpalkova L, Riordan D, Walsh J (2019) Deep learning vs. traditional computer vision. In: Advances in intelligent systems and computing. Springer International Publishing, pp 128–144
Pasupa K, Sunhem W (2016) A comparison between shallow and deep architecture classifiers on small dataset. In: 2016 8Th international conference on information technology and electrical engineering (ICITEE), pp 1–6
Perazzi F, Khoreva A, Benenson R, Schiele B, Sorkine-Hornung A (2017) Learning video object segmentation from static images. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 3491–3500
Qi W, Fu L, Zhenzhong L (2010) Review on camera calibration. In: 2010 Chinese control and decision conference, pp 3354–3358
Redmon J, Farhadi A (2017) YOLO9000: Better, faster, stronger. In: 2017 IEEE Conference on computer vision and pattern recognition, CVPR 2017, honolulu, HI, USA, July 21-26, 2017, pp 6517–6525
Ren K-Y, Sun H-X, Jia Q-X, Wu Y-H, Zhang W-Y, Gao X, Ye P, Song J-Z (2011) Urban scene recognition by graphical model and 3d geometry. The journal of china universities of posts and telecommunications 18:110–119, 06
Article Google Scholar
Reza MA, Naik AU, Chen K, Crandall DJ (2019) Automatic annotation for semantic segmentation in indoor scenes. In: 2019 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 4970–4976
Saribas H, Uzun B, Benligiray B, Eker O, Cevikalp H (2019) A hybrid method for tracking of objects by uavs. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition workshops (CVPRW), pp 563–572
Shrivastava A, Amudha J, Gupta D, Sharma K (2019) Deep learning model for text recognition in images. In: 2019 10Th international conference on computing, communication and networking technologies (ICCCNT), pp 1–6
Simeone O (2018) A very brief introduction to machine learning with applications to communication systems. IEEE Trans Cognitive Commun Netw 4 (4):648–664
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 09
Snijders C, Matzat U, Reips U-D (2012) big data: Big gaps of knowledge in the field of internet science. Int J Internet Sci 7:1–5, 01
Google Scholar
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826
Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 10778–10787
Tani LF, Ghomari A, Kazi Tani MY (2019) A semi-automatic soccer video annotation system based on ontology paradigm. In: 2019 10th international conference on information and communication systems (ICICS), pp 88–93
Tao A, Sapra K, Catanzaro B (2020) Hierarchical multi-scale attention for semantic segmentation
Tokmakov P, Alahari K, Schmid C (2016) Learning motion patterns in videos. arXiv:1612.07217
Voigtlaender P, Chai Y, Schroff F, Adam H, Leibe B, Chieh Chen L (2019) Feelvos: Fast end-to-end embedding learning for video object segmentation. In: International conference on computer vision and pattern recognition (CVPR)
Wang T, Bo H, Collomosse J (2013) Touchcut: Fast image and video segmentation using single-touch interaction. Comput Vis Image Underst 120:01
Google Scholar
Wang W, Shen J, Porikli F (2015) Saliency-aware geodesic video object segmentation. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), pp 3395–3402
Wiley V, Lucas T (2018) Computer vision and image processing: a paper review. Int J Artif Intell Res 2(1):29–36
Article Google Scholar
Xu K, Wen L, Li G, Bo L, Huang Q (2019) Spatiotemporal cnn for video object segmentation. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 1379–1388
Yang F, Shi F, Wang J (2009) An improved gmm-based method for supervised semantic image annotation. In: 2009 IEEE International conference on intelligent computing and intelligent systems, vol 3, pp 506–510
Zhu M (2004) Recall precision and average precision. Department of Statistics and Actuarial Science, 2

Download references

Acknowledgements

This work was partially supported by Ericsson Research (Brazil), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).

Author information

Authors and Affiliations

Grupo de Pesquisa em Redes e Telecomunicação, Universidade Federal de Pernambuco, Centro de Informática, Pernambuco, Brazil
Marrone Danta, Pedro Dreyer, Daniel Bezerra, Gabriel Reis, Judith Kelner & Djamel Sadok
Ericsson Research, Indaiatuba, São Paulo, Brazil
Ricardo Souza & Silvia Lins

Authors

Marrone Danta
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Dreyer
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Bezerra
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Reis
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Souza
View author publications
You can also search for this author in PubMed Google Scholar
Silvia Lins
View author publications
You can also search for this author in PubMed Google Scholar
Judith Kelner
View author publications
You can also search for this author in PubMed Google Scholar
Djamel Sadok
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marrone Danta.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Danta, M., Dreyer, P., Bezerra, D. et al. Video object segmentation for automatic image annotation of ethernet connectors with environment mapping and 3D projection. Multimed Tools Appl 81, 39891–39913 (2022). https://doi.org/10.1007/s11042-022-13128-z

Download citation

Received: 09 November 2020
Revised: 17 February 2021
Accepted: 10 April 2022
Published: 04 May 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s11042-022-13128-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video object segmentation for automatic image annotation of ethernet connectors with environment mapping and 3D projection

Abstract

Access this article

Similar content being viewed by others

Multitask Learning for Extensive Object Description to Improve Scene Understanding on Monocular Video

Automatic object annotation in streamed and remotely explored large 3D reconstructions

Visual Positioning of Distant Wall-Climbing Robots Using Convolutional Neural Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Video object segmentation for automatic image annotation of ethernet connectors with environment mapping and 3D projection

Abstract

Access this article

Similar content being viewed by others

Multitask Learning for Extensive Object Description to Improve Scene Understanding on Monocular Video

Automatic object annotation in streamed and remotely explored large 3D reconstructions

Visual Positioning of Distant Wall-Climbing Robots Using Convolutional Neural Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation