Skip to main content
Log in

Video object segmentation for automatic image annotation of ethernet connectors with environment mapping and 3D projection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The creation of a dataset is time-consuming and sometimes discourages researchers from pursuing their goals. To overcome this problem, we present and discuss two solutions adopted for the automation of this process. Both optimize valuable user time and resources and use video object segmentation with object tracking and 3D projection. In our scenario, we acquire images from a moving robotic arm and, for each approach, generate distinct annotated datasets. We evaluated the precision of the annotations by comparing these with a manually annotated dataset. As a complementary test to assess the quality of the generated datasets and to achieve a generalization of our contribution, we tested detection and classification problems. In both tests, we rely on solutions with Convolution Neural Network and Deep Learning. For detection support, we used YOLO and obtained for the projection dataset an F1-Score, accuracy, and mAP values of 0.846, 0.924, and 0.875, respectively. Concerning the tracking dataset, we achieved an F1-Score of 0.861, an accuracy of 0.932, whereas mAP reached 0.894. For the classification, we adopted the two metrics accuracy and F1-Score, and used the known networks VGG, DenseNet, MobileNet, Inception, and ResNet. The VGG architecture outperformed the others for both projection and tracking datasets. It reached an accuracy and F1-score of 0.997 and 0.993, respectively. Similarly, for the tracking dataset, it achieved an accuracy of 0.991 and an F1-Score of 0.981.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Adibhatla VA, Chih H-C, Hsu C-C, Cheng J, Abbod MF, Shieh J-S (2020) Defect detection in printed circuit boards using you-only-look-once convolutional neural networks. Electronics, 9(9)

  2. Akhilesh K, Sedamkar RR (2016) Automatic image annotation using an ant colony optimization algorithm (aco). In: 2016 IEEE 7th power India international conference (PIICON), pp 1–4

  3. Alham NK, Li M, Liu Y, Hammoud S, Ponraj M (2011) A distributed svm for scalable image annotation. In: 2011 Eighth international conference on fuzzy systems and knowledge discovery (FSKD), vol 4, pp 2655–2658

  4. Berg A, Johnander J, Durand de Gevigney F, Ahlberg J, Felberg M (2019) Semi-automatic annotation of objects in visual-thermal video. In: 2019 IEEE/CVF International conference on computer vision workshop (ICCVW), pp 2242–2251

  5. Chu G, Niu K, Tian B (2014) Automatic image annotation combining svms and knn algorithm. In: 2014 IEEE 3Rd international conference on cloud computing and intelligence systems, pp 13–17

  6. Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, pp 248–255

  7. Ding S, Wu J, Xu W, Chao H (2016) Automatically building face datasets of new domains from weakly labeled data with pretrained models. arXiv:1611.08107

  8. Ding S, Wu J, Xu W, Chao H (2016) Automatically building face datasets of new domains from weakly labeled data with pretrained models. arXiv:1611.08107

  9. Dube P, Bhattacharjee B, Huo S, Watson P, Belgodere B (2019) Automatic labeling of data for transfer learning. In: The IEEE conference on computer vision and pattern recognition (CVPR) Workshops

  10. Duda A, Frese U (2018) Accurate detection and localization of checkerboard corners for calibration. In: BMVC

  11. Esteva A, Chou K, Yeung S, Naik N, Madani A, Mottaghi A, Liu Y, Topol E, Dean J, Socher R (2021) Deep learning-enabled medical computer vision. npj Digital Medicine 4(1):5

    Article  Google Scholar 

  12. Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88 (2):303–338

    Article  Google Scholar 

  13. Faktor A, Irani M (2014) Video segmentation by non-local consensus voting. BMVC 2014 - Proceedings of the British machine vision conference 2014:01

    Google Scholar 

  14. Fischler MA, Bolles RC (1981) Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395

    Article  MathSciNet  Google Scholar 

  15. Gonzalez-Diaz R, Paluzo-Hidalgo E, Gutiérrez-Naranjo MA (2018) Representative datasets for neural networks. Electron Notes Discrete Math 68:89–94

    Article  Google Scholar 

  16. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge

    MATH  Google Scholar 

  17. Hartley R, Zisserman A (2003) Multiple view geometry in computer vision, 2nd edn. Cambridge University Press, New York

    MATH  Google Scholar 

  18. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: 2017 IEEE International conference on computer vision (ICCV), pp 2980–2988

  19. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 770–778

  20. Howard AG, Zhu M, Bo C, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861

  21. Huang G, Liu Z, Weinberger KQ (2016) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269

  22. Iwendi C, Khan S, Anajemba JH, Mittal M, Alenezi M, Alazab M (2020) The use of ensemble models for multiple class and binary class classification for improving intrusion detection systems. Sensors, 20(9)

  23. Iwendi C, Srivastava G, Khan S, Maddikunta PKR (2020) Cyberbullying detection solutions based on deep learning architectures. Multimedia Systems

  24. Jin Y, Li J, Ma D, Guo X, Yu H (2017) A semi-automatic annotation technology for traffic scene image labeling based on deep learning preprocessing. In: 2017 IEEE International conference on computational science and engineering, vol 1, pp 315–320

  25. Kamilaris A, Prenafeta-Boldú FX (2018) Deep learning in agriculture A survey. Computers and electronics in agriculture 147:70–90

    Article  Google Scholar 

  26. Kurilkin A, Ivanov S (2016) A comparison of methods to detect people flow using video processing. Procedia Comput Sci 101:125–134, 12

    Article  Google Scholar 

  27. Lee S, Kwak S, Cho M (2019) Universal bounding box regression and its applications

  28. Li Y, Guo P, Xin X (2016) A divide and conquer method for automatic image annotation. In: 2016 12Th international conference on computational intelligence and security (CIS), pp 660–664

  29. Li Y, Guo P, Xin X (2017) A divide and conquer method for automatic image annotation. In: Proceedings - 12th international conference on computational intelligence and security, CIS 2016. Institute of Electrical and Electronics Engineers Inc, pp 660–664

  30. Li S, Seybold B, Vorobyov A, Fathi A, Huang Q, Jay Kuo C-C (2018) Instance embedding transfer to unsupervised video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  31. Liu S, Niles-Weed J, Razavian N, Fernandez-Granda C (2020) Early-learning regularization prevents memorization of noisy labels. In: Advances in neural information processing systems, 33

  32. Lukežič A, Vojíř T, Zajc LČ, Matas J, Kristan M (2016) Discriminative correlation filter with channel and spatial reliability. Int J Comput Vis 126:11

    MathSciNet  Google Scholar 

  33. Manaf SA, Nordin MJ (2009) Review on statistical approaches for automatic image annotation. In: 2009 International conference on electrical engineering and informatics, vol 01, pp 56–61

  34. O’Mahony N, Campbell S, Carvalho A, Harapanahalli S, Hernandez GV, Krpalkova L, Riordan D, Walsh J (2019) Deep learning vs. traditional computer vision. In: Advances in intelligent systems and computing. Springer International Publishing, pp 128–144

  35. Pasupa K, Sunhem W (2016) A comparison between shallow and deep architecture classifiers on small dataset. In: 2016 8Th international conference on information technology and electrical engineering (ICITEE), pp 1–6

  36. Perazzi F, Khoreva A, Benenson R, Schiele B, Sorkine-Hornung A (2017) Learning video object segmentation from static images. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 3491–3500

  37. Qi W, Fu L, Zhenzhong L (2010) Review on camera calibration. In: 2010 Chinese control and decision conference, pp 3354–3358

  38. Redmon J, Farhadi A (2017) YOLO9000: Better, faster, stronger. In: 2017 IEEE Conference on computer vision and pattern recognition, CVPR 2017, honolulu, HI, USA, July 21-26, 2017, pp 6517–6525

  39. Ren K-Y, Sun H-X, Jia Q-X, Wu Y-H, Zhang W-Y, Gao X, Ye P, Song J-Z (2011) Urban scene recognition by graphical model and 3d geometry. The journal of china universities of posts and telecommunications 18:110–119, 06

    Article  Google Scholar 

  40. Reza MA, Naik AU, Chen K, Crandall DJ (2019) Automatic annotation for semantic segmentation in indoor scenes. In: 2019 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 4970–4976

  41. Saribas H, Uzun B, Benligiray B, Eker O, Cevikalp H (2019) A hybrid method for tracking of objects by uavs. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition workshops (CVPRW), pp 563–572

  42. Shrivastava A, Amudha J, Gupta D, Sharma K (2019) Deep learning model for text recognition in images. In: 2019 10Th international conference on computing, communication and networking technologies (ICCCNT), pp 1–6

  43. Simeone O (2018) A very brief introduction to machine learning with applications to communication systems. IEEE Trans Cognitive Commun Netw 4 (4):648–664

    Article  Google Scholar 

  44. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 09

  45. Snijders C, Matzat U, Reips U-D (2012) big data: Big gaps of knowledge in the field of internet science. Int J Internet Sci 7:1–5, 01

    Google Scholar 

  46. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826

  47. Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 10778–10787

  48. Tani LF, Ghomari A, Kazi Tani MY (2019) A semi-automatic soccer video annotation system based on ontology paradigm. In: 2019 10th international conference on information and communication systems (ICICS), pp 88–93

  49. Tao A, Sapra K, Catanzaro B (2020) Hierarchical multi-scale attention for semantic segmentation

  50. Tokmakov P, Alahari K, Schmid C (2016) Learning motion patterns in videos. arXiv:1612.07217

  51. Voigtlaender P, Chai Y, Schroff F, Adam H, Leibe B, Chieh Chen L (2019) Feelvos: Fast end-to-end embedding learning for video object segmentation. In: International conference on computer vision and pattern recognition (CVPR)

  52. Wang T, Bo H, Collomosse J (2013) Touchcut: Fast image and video segmentation using single-touch interaction. Comput Vis Image Underst 120:01

    Google Scholar 

  53. Wang W, Shen J, Porikli F (2015) Saliency-aware geodesic video object segmentation. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), pp 3395–3402

  54. Wiley V, Lucas T (2018) Computer vision and image processing: a paper review. Int J Artif Intell Res 2(1):29–36

    Article  Google Scholar 

  55. Xu K, Wen L, Li G, Bo L, Huang Q (2019) Spatiotemporal cnn for video object segmentation. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 1379–1388

  56. Yang F, Shi F, Wang J (2009) An improved gmm-based method for supervised semantic image annotation. In: 2009 IEEE International conference on intelligent computing and intelligent systems, vol 3, pp 506–510

  57. Zhu M (2004) Recall precision and average precision. Department of Statistics and Actuarial Science, 2

Download references

Acknowledgements

This work was partially supported by Ericsson Research (Brazil), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marrone Danta.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Danta, M., Dreyer, P., Bezerra, D. et al. Video object segmentation for automatic image annotation of ethernet connectors with environment mapping and 3D projection. Multimed Tools Appl 81, 39891–39913 (2022). https://doi.org/10.1007/s11042-022-13128-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13128-z

Keywords

Navigation