Abstract
The creation of a dataset is time-consuming and sometimes discourages researchers from pursuing their goals. To overcome this problem, we present and discuss two solutions adopted for the automation of this process. Both optimize valuable user time and resources and use video object segmentation with object tracking and 3D projection. In our scenario, we acquire images from a moving robotic arm and, for each approach, generate distinct annotated datasets. We evaluated the precision of the annotations by comparing these with a manually annotated dataset. As a complementary test to assess the quality of the generated datasets and to achieve a generalization of our contribution, we tested detection and classification problems. In both tests, we rely on solutions with Convolution Neural Network and Deep Learning. For detection support, we used YOLO and obtained for the projection dataset an F1-Score, accuracy, and mAP values of 0.846, 0.924, and 0.875, respectively. Concerning the tracking dataset, we achieved an F1-Score of 0.861, an accuracy of 0.932, whereas mAP reached 0.894. For the classification, we adopted the two metrics accuracy and F1-Score, and used the known networks VGG, DenseNet, MobileNet, Inception, and ResNet. The VGG architecture outperformed the others for both projection and tracking datasets. It reached an accuracy and F1-score of 0.997 and 0.993, respectively. Similarly, for the tracking dataset, it achieved an accuracy of 0.991 and an F1-Score of 0.981.
Similar content being viewed by others
References
Adibhatla VA, Chih H-C, Hsu C-C, Cheng J, Abbod MF, Shieh J-S (2020) Defect detection in printed circuit boards using you-only-look-once convolutional neural networks. Electronics, 9(9)
Akhilesh K, Sedamkar RR (2016) Automatic image annotation using an ant colony optimization algorithm (aco). In: 2016 IEEE 7th power India international conference (PIICON), pp 1–4
Alham NK, Li M, Liu Y, Hammoud S, Ponraj M (2011) A distributed svm for scalable image annotation. In: 2011 Eighth international conference on fuzzy systems and knowledge discovery (FSKD), vol 4, pp 2655–2658
Berg A, Johnander J, Durand de Gevigney F, Ahlberg J, Felberg M (2019) Semi-automatic annotation of objects in visual-thermal video. In: 2019 IEEE/CVF International conference on computer vision workshop (ICCVW), pp 2242–2251
Chu G, Niu K, Tian B (2014) Automatic image annotation combining svms and knn algorithm. In: 2014 IEEE 3Rd international conference on cloud computing and intelligence systems, pp 13–17
Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet a large-scale hierarchical image database. In: 2009 IEEE Conference on computer vision and pattern recognition, pp 248–255
Ding S, Wu J, Xu W, Chao H (2016) Automatically building face datasets of new domains from weakly labeled data with pretrained models. arXiv:1611.08107
Ding S, Wu J, Xu W, Chao H (2016) Automatically building face datasets of new domains from weakly labeled data with pretrained models. arXiv:1611.08107
Dube P, Bhattacharjee B, Huo S, Watson P, Belgodere B (2019) Automatic labeling of data for transfer learning. In: The IEEE conference on computer vision and pattern recognition (CVPR) Workshops
Duda A, Frese U (2018) Accurate detection and localization of checkerboard corners for calibration. In: BMVC
Esteva A, Chou K, Yeung S, Naik N, Madani A, Mottaghi A, Liu Y, Topol E, Dean J, Socher R (2021) Deep learning-enabled medical computer vision. npj Digital Medicine 4(1):5
Everingham M, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88 (2):303–338
Faktor A, Irani M (2014) Video segmentation by non-local consensus voting. BMVC 2014 - Proceedings of the British machine vision conference 2014:01
Fischler MA, Bolles RC (1981) Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395
Gonzalez-Diaz R, Paluzo-Hidalgo E, Gutiérrez-Naranjo MA (2018) Representative datasets for neural networks. Electron Notes Discrete Math 68:89–94
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
Hartley R, Zisserman A (2003) Multiple view geometry in computer vision, 2nd edn. Cambridge University Press, New York
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: 2017 IEEE International conference on computer vision (ICCV), pp 2980–2988
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 770–778
Howard AG, Zhu M, Bo C, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Huang G, Liu Z, Weinberger KQ (2016) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269
Iwendi C, Khan S, Anajemba JH, Mittal M, Alenezi M, Alazab M (2020) The use of ensemble models for multiple class and binary class classification for improving intrusion detection systems. Sensors, 20(9)
Iwendi C, Srivastava G, Khan S, Maddikunta PKR (2020) Cyberbullying detection solutions based on deep learning architectures. Multimedia Systems
Jin Y, Li J, Ma D, Guo X, Yu H (2017) A semi-automatic annotation technology for traffic scene image labeling based on deep learning preprocessing. In: 2017 IEEE International conference on computational science and engineering, vol 1, pp 315–320
Kamilaris A, Prenafeta-Boldú FX (2018) Deep learning in agriculture A survey. Computers and electronics in agriculture 147:70–90
Kurilkin A, Ivanov S (2016) A comparison of methods to detect people flow using video processing. Procedia Comput Sci 101:125–134, 12
Lee S, Kwak S, Cho M (2019) Universal bounding box regression and its applications
Li Y, Guo P, Xin X (2016) A divide and conquer method for automatic image annotation. In: 2016 12Th international conference on computational intelligence and security (CIS), pp 660–664
Li Y, Guo P, Xin X (2017) A divide and conquer method for automatic image annotation. In: Proceedings - 12th international conference on computational intelligence and security, CIS 2016. Institute of Electrical and Electronics Engineers Inc, pp 660–664
Li S, Seybold B, Vorobyov A, Fathi A, Huang Q, Jay Kuo C-C (2018) Instance embedding transfer to unsupervised video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Liu S, Niles-Weed J, Razavian N, Fernandez-Granda C (2020) Early-learning regularization prevents memorization of noisy labels. In: Advances in neural information processing systems, 33
Lukežič A, Vojíř T, Zajc LČ, Matas J, Kristan M (2016) Discriminative correlation filter with channel and spatial reliability. Int J Comput Vis 126:11
Manaf SA, Nordin MJ (2009) Review on statistical approaches for automatic image annotation. In: 2009 International conference on electrical engineering and informatics, vol 01, pp 56–61
O’Mahony N, Campbell S, Carvalho A, Harapanahalli S, Hernandez GV, Krpalkova L, Riordan D, Walsh J (2019) Deep learning vs. traditional computer vision. In: Advances in intelligent systems and computing. Springer International Publishing, pp 128–144
Pasupa K, Sunhem W (2016) A comparison between shallow and deep architecture classifiers on small dataset. In: 2016 8Th international conference on information technology and electrical engineering (ICITEE), pp 1–6
Perazzi F, Khoreva A, Benenson R, Schiele B, Sorkine-Hornung A (2017) Learning video object segmentation from static images. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR), pp 3491–3500
Qi W, Fu L, Zhenzhong L (2010) Review on camera calibration. In: 2010 Chinese control and decision conference, pp 3354–3358
Redmon J, Farhadi A (2017) YOLO9000: Better, faster, stronger. In: 2017 IEEE Conference on computer vision and pattern recognition, CVPR 2017, honolulu, HI, USA, July 21-26, 2017, pp 6517–6525
Ren K-Y, Sun H-X, Jia Q-X, Wu Y-H, Zhang W-Y, Gao X, Ye P, Song J-Z (2011) Urban scene recognition by graphical model and 3d geometry. The journal of china universities of posts and telecommunications 18:110–119, 06
Reza MA, Naik AU, Chen K, Crandall DJ (2019) Automatic annotation for semantic segmentation in indoor scenes. In: 2019 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 4970–4976
Saribas H, Uzun B, Benligiray B, Eker O, Cevikalp H (2019) A hybrid method for tracking of objects by uavs. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition workshops (CVPRW), pp 563–572
Shrivastava A, Amudha J, Gupta D, Sharma K (2019) Deep learning model for text recognition in images. In: 2019 10Th international conference on computing, communication and networking technologies (ICCCNT), pp 1–6
Simeone O (2018) A very brief introduction to machine learning with applications to communication systems. IEEE Trans Cognitive Commun Netw 4 (4):648–664
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 09
Snijders C, Matzat U, Reips U-D (2012) big data: Big gaps of knowledge in the field of internet science. Int J Internet Sci 7:1–5, 01
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2015) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826
Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 10778–10787
Tani LF, Ghomari A, Kazi Tani MY (2019) A semi-automatic soccer video annotation system based on ontology paradigm. In: 2019 10th international conference on information and communication systems (ICICS), pp 88–93
Tao A, Sapra K, Catanzaro B (2020) Hierarchical multi-scale attention for semantic segmentation
Tokmakov P, Alahari K, Schmid C (2016) Learning motion patterns in videos. arXiv:1612.07217
Voigtlaender P, Chai Y, Schroff F, Adam H, Leibe B, Chieh Chen L (2019) Feelvos: Fast end-to-end embedding learning for video object segmentation. In: International conference on computer vision and pattern recognition (CVPR)
Wang T, Bo H, Collomosse J (2013) Touchcut: Fast image and video segmentation using single-touch interaction. Comput Vis Image Underst 120:01
Wang W, Shen J, Porikli F (2015) Saliency-aware geodesic video object segmentation. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), pp 3395–3402
Wiley V, Lucas T (2018) Computer vision and image processing: a paper review. Int J Artif Intell Res 2(1):29–36
Xu K, Wen L, Li G, Bo L, Huang Q (2019) Spatiotemporal cnn for video object segmentation. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 1379–1388
Yang F, Shi F, Wang J (2009) An improved gmm-based method for supervised semantic image annotation. In: 2009 IEEE International conference on intelligent computing and intelligent systems, vol 3, pp 506–510
Zhu M (2004) Recall precision and average precision. Department of Statistics and Actuarial Science, 2
Acknowledgements
This work was partially supported by Ericsson Research (Brazil), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Danta, M., Dreyer, P., Bezerra, D. et al. Video object segmentation for automatic image annotation of ethernet connectors with environment mapping and 3D projection. Multimed Tools Appl 81, 39891–39913 (2022). https://doi.org/10.1007/s11042-022-13128-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13128-z