Skip to main content
Log in

Automatic “Ground Truth” Annotation and Industrial Workpiece Dataset Generation for Deep Learning

  • Research Article
  • Published:
International Journal of Automation and Computing Aims and scope Submit manuscript

Abstract

In industry, it is becoming common to detect and recognize industrial workpieces using deep learning methods. In this field, the lack of datasets is a big problem, and collecting and annotating datasets in this field is very labor intensive. The researchers need to perform dataset annotation if a dataset is generated by themselves. It is also one of the restrictive factors that the current method based on deep learning cannot expand well. At present, there are very few workpiece datasets for industrial fields, and the existing datasets are generated from ideal workpiece computer aided design (CAD) models, for which few actual workpiece images were collected and utilized. We propose an automatic industrial workpiece dataset generation method and an automatic ground truth annotation method. Included in our methods are three algorithms that we proposed: a point cloud based spatial plane segmentation algorithm to segment the workpieces in the real scene and to obtain the annotation information of the workpieces in the images captured in the real scene; a random multiple workpiece generation algorithm to generate abundant composition datasets with random rotation workpiece angles and positions; and a tangent vector based contour tracking and completion algorithm to get improved contour images. With our procedures, annotation information can be obtained using the algorithms proposed in this paper. Upon completion of the annotation process, a json format file is generated. Faster R-CNN (Faster R-convolutional neural network), SSD (single shot multibox detector) and YOLO (you only look once: unified, real-time object detection) are trained using the datasets proposed in this paper. The experimental results show the effectiveness and integrity of this dataset generation and annotation method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. J. X. Xiao, K. A. Ehinger, J. Hays, A. Torralba, A. Oliva. Sun database: Exploring a large collection of scene categories. International Journal of Computer Vision, vol. 119, no. 1, pp. 3–22, 2016. DOI: https://doi.org/10.1007/s11263-014-0748-y.

    Article  MathSciNet  Google Scholar 

  2. A. Torralba, R. Fergus, W. T. Freeman. 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 11, pp. 1958–1970, 2008. DOI: https://doi.org/10.1109/TPAMI.2008.128.

    Article  Google Scholar 

  3. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, A. Zisserman. The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, 2010. DOI: https://doi.org/10.1007/s11263-009-0275-4.

    Article  Google Scholar 

  4. J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Miami, USA, pp. 248–255, 2009. DOI: https://doi.org/10.1109/CVPR.2009.5206848.

    Google Scholar 

  5. T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, C. L. Zitnick. Microsoft COCO: Common objects in context. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 740–755, 2014. DOI: https://doi.org/10.1007/978-3-319-10602-148.

    Google Scholar 

  6. B. L. Zhou, A. Lapedriza, A. Khosla, A. Oliva, A. Torralba. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 6, pp. 1452–1464, 2018. DOI: https://doi.org/10.1109/tpami.2017.2723009.

    Article  Google Scholar 

  7. I. Krasin, T. Duerig, N. Alldrin, V. Ferrari, S. Abu-El-Haija, A. Kuznetsova, H. Rom, J. Uijlings, S. Popov, S. Kamali, M. Malloci, J. Pont-Tuset, A. Veit, S. Belongie, V. Gomes, A. Gupta, C. Sun, G. Chechik, D. Cai, Z. Feng, D. Narayanan, K. Murphy. OpenImages: A public dataset for large-scale multi-label and multi-class image classification, [Online], Available: https://storage.googleap s.com/openimages/web/index.html, October 6, 2019.

  8. J. Tremblay, T. To, A. Molchanov, S. Tyree, J. Kautz, S. Birchfield. Synthetically trained neural networks for learning human-readable plans from real-world demonstrations. In Proceedings of IEEE International Conference on Robotics and Automation, IEEE, Brisbane, Australia, pp. 5659–5666, 2018. DOI: https://doi.org/10.1109/ICRA.2018.8460642.

    Google Scholar 

  9. J. Tremblay, T. To, S. Birchfield. Falling things: A synthetic dataset for 3D object detection and pose estimation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, IEEE, Salt Lake City, USA, pp. 2119–21193, 2018. DOI: https://doi.org/10.1109/cvprw.2018.00275.

    Google Scholar 

  10. B. Calli, A. Singh, A. Walsman, S. Srinivasa, P. Abbeel, A. M. Dollar. The YCB object and model set: Towards common benchmarks for manipulation research. In Proceedings of International Conference on Advanced Robotics, IEEE, Istanbul, Turkey, pp. 510–517, 2015. DOI: https://doi.org/10.1109/ICAR.2015.7251504.

    Google Scholar 

  11. M. Arsenovic, S. Sladojevic, A. Anderla, D. Stefanovic, B. Lalic. Deep learning powered automated tool for generating image based datasets. In Proceedings of the 14th IEEE International Scientific Conference on Informatics, IEEE, Poprad, Slovakia, pp. 13–17, 2017. DOI: https://doi.org/10.1109/informatics.2017.8327214.

    Google Scholar 

  12. J. Sun, P. Wang, Y. K. Luo, G. M. Hao, H. Qiao. Precision work-piece detection and measurement combining top-down and bottom-up saliency. International Journal of Automation and Computing, vol. 15, no. 4, pp. 417–430, 2018. DOI: https://doi.org/10.1007/s11633-018-1123-1.

    Article  Google Scholar 

  13. N. Poolsawad, L. Moore, C. Kambhampati, J. G. F. Cleland. Issues in the mining of heart failure datasets. International Journal of Automation and Computing, vol. 11, no. 2, pp. 162–179, 2014. DOI: https://doi.org/10.1007/s11633-014-0778-5.

    Article  Google Scholar 

  14. X. Y. Gong, H. Su, D. Xu, Z. T. Zhang, F. Shen, H. B. Yang. An overview of contour detection approaches. International Journal of Automation and Computing, vol. 15, no. 6, pp. 656–672, 2018. DOI: https://doi.org/10.1007/s11633-018-1117-z.

    Article  Google Scholar 

  15. A. Aldoma, T. Fäulhammer, M. Vincze. Automation of “ground truth” annotation for multi-view RGB-D object instance recognition datasets. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Chicago, USA, pp. 5016–5023, 2014. DOI: https://doi.org/10.1109/IROS.2014.6943275.

    Google Scholar 

  16. K. Lai, L. F. Bo, X. F. Ren, D. Fox. A large-scale hierarchical multi-view RGB-D object dataset. In Proceedings of IEEE International Conference on Robotics and Automation, IEEE, Shanghai, China, pp. 1817–1824, 2011. DOI: https://doi.org/10.1109/icra.2011.5980382.

    Google Scholar 

  17. M. Di Cicco, C. Potena, G. Grisetti, A. Pretto. Automatic model based dataset generation for fast and accurate crop and weeds detection. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, Vancouver, Canada, pp. 5188–5195, 2017. DOI: https://doi.org/10.1109/IROS.2017.8206408.

    Google Scholar 

  18. S. Greuter, J. Parker, N. Stewart, G. Leach. Real-time procedural generation of ‘pseudo infinite’ cities. In Proceedings of the 1st International Conference on Computer Graphics and Interactive Techniques in Australasia and South East Asia, ACM, Melbourne, Australia, pp. 87–94, 2003. DOI: https://doi.org/10.1145/604487.604490.

    Google Scholar 

  19. R. Van Der Linden, R. Lopes, R. Bidarra. Procedural generation of dungeons. IEEE Transactions on Computational Intelligence and AI in Games, vol. 6, no. 1, pp. 78–89, 2013. DOI: https://doi.org/10.1109/tciaig.2013.2290371.

    Article  Google Scholar 

  20. S. R. Richter, V. Vineet, S. Roth, V. Koltun. Playing for data: Ground truth from computer games. In Proceedings of 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 102–118, 2016. DOI: https://doi.org/10.1007/978-3-319-46475-6_7.

    Google Scholar 

  21. P. Marion, P. R. Florence, L. Manuelli, R. Tedrake. Label Fusion: A pipeline for generating ground truth labels for real RGBD data of cluttered scenes. In Proceedings of IEEE International Conference on Robotics and Automation, Brisbane, Australia, pp. 3235–3242, 2018. DOI: https://doi.org/10.1109/icra.2018.8460950.

  22. T. Hodan, P. Haluza, S. Obdrzalek, J. Matas, M. Lourakis, X. Zabulis. T-LESS: An RGB-D dataset for 6D pose estimation of texture-less objects. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, IEEE, Santa Rosa, USA, pp. 880–888, 2017. DOI: https://doi.org/10.1109/WACV.2017.103.

    Google Scholar 

  23. H. Hattori, V. Naresh Boddeti, K. Kitani, T. Kanade. Learning scene-specific pedestrian detectors without real data. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 3819–3827, 2015. DOI: https://doi.org/10.1109/CVPR.2015.7299006.

    Google Scholar 

  24. H. S. Koppula, A. Anand, T. Joachims, A. Saxena. Semantic labeling of 3D point clouds for indoor scenes. In Proceedings of the 24th International Conference on Neural Information Processing Systems, ACM, Red Hook, USA, pp. 244–252, 2011.

    Google Scholar 

  25. J. Xie, M. Kiefel, M. T. Sun, A. Geiger. Semantic instance annotation of street scenes by 3D to 2D label transfer. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 3688–3697, 2016. DOI: https://doi.org/10.1109/CVPR.2016.401.

    Google Scholar 

  26. B. Zoph, E. D. Cubuk, G. Ghiasi, T. Y. Lin, J. Shlens, Q. V. Le. Learning data augmentation strategies for object detection. ArXiv preprint ArXiv: 1906.11172, 2019.

  27. A. Dutta, A. Zisserman. The VIA annotation software for images, audio and video. ArXiv preprint ArXiv: 1904. 10699, 2019.

  28. L. Von Ahn, L. Dabbish. Labeling images with a computer game. In Proceedings of SIGCHI Conference on Human Factors in Computing Systems, ACM, New York, USA, pp. 319–326, 2004. DOI: https://doi.org/10.1145/985692.985733.

    Google Scholar 

  29. C. H. Zhang, K. Loken, Z. Y. Chen, Z. Y. Xiao, G. Kunkel. Mask Editor: An image annotation tool for image segmentation tasks. ArXiv preprint ArXiv: 1809.06461v1, 2018.

  30. B. C. Russell, A. Torralba, K. P. Murphy, W. T. Freeman. LabelMe: A database and web-based tool for image annotation. International Journal of Computer Vision, vol. 77, no. 1–3, pp. 157–173, 2008. DOI: https://doi.org/10.1007/s11263-007-0090-8.

    Article  Google Scholar 

  31. M. Johnson-Roberson, C. Barto, R. Mehta, S. N. Sridhar, K. Rosaen, R. Vasudevan. Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? In Proceedings of IEEE International Conference on Robotics and Automation, IEEE, Singapore, pp. 746–753, 2017. DOI: https://doi.org/10.1109/icra.2017.7989092.

    Google Scholar 

  32. B. T. Phong. Illumination for computer generated pictures. Communications of the ACM, vol. 18, no. 6, pp. 311–317, 1975. DOI: https://doi.org/10.1145/360825.360839.

    Article  Google Scholar 

  33. S. Q. Ren, K. M. He, R. Girshick, J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, ACM, Cambridge, USA, pp. 91–99, 2015.

    Google Scholar 

  34. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, A. C. Berg. Ssd: Single shot multibox detector. In Proceedings of the 14th European Conference on Computer Vision, Springer, Amsterdam, The Netherlands, pp. 21–37, 2016. DOI: https://doi.org/10.1007/978-3-319-46448-0_2.

    Google Scholar 

  35. J. Redmon, S. Divvala, R. Girshick, A. Farhadi. You only look once: Unified, real-time object detection. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 779–788, 2016. DOI: https://doi.org/10.1109/CVPR.2016.91.

    Google Scholar 

  36. F. Q. Liu, Z. Y. Wang. PolishNet-2d and PolishNet-3d: Deep learning—based workpiece recognition. IEEE Access, vol. 7, pp. 127042–127054, 1270. DOI: https://doi.org/10.1109/ACCESS.2019.2940411.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fu-Qiang Liu.

Additional information

Fu-Qiang Liu received the M. Sc. degree in computer technology from Harbin Engineering University, China in 2013. He is currently a Ph. D. degree candidate in control science and engineering at College of Automation, Harbin Engineering University, China.

His research interests include computer vision, deep learning, artificial intelligent, deep neural networks, simultaneous localization and mapping (SLAM), and robotics.

Zong-Yi Wang received the Ph.D. degree in control theory and control engineering from Harbin Engineering University, China in 2005. He is a professor at College of Automation, Harbin Engineering University, China. He won first prize of the Heilongjiang Provincial Scientific and Technological Progress Award in 2004.

His research interests include computer vision, robotics, welding and cutting intelligence.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, FQ., Wang, ZY. Automatic “Ground Truth” Annotation and Industrial Workpiece Dataset Generation for Deep Learning. Int. J. Autom. Comput. 17, 539–550 (2020). https://doi.org/10.1007/s11633-020-1221-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11633-020-1221-8

Keywords

Navigation