Recognition and location of typical automotive parts based on the RGB-D camera

Liu, Wei; Li, Fengpeng; Jing, Cheng; Wan, Yidong; Su, Beibei; Helali, M.

doi:10.1007/s40747-020-00182-z

Recognition and location of typical automotive parts based on the RGB-D camera

Original Article
Open access
Published: 01 August 2020

Volume 7, pages 1759–1765, (2021)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

Recognition and location of typical automotive parts based on the RGB-D camera

Download PDF

Wei Liu^1,2,
Fengpeng Li^1,2,
Cheng Jing¹,
Yidong Wan¹,
Beibei Su¹ &
…
M. Helali²

2317 Accesses
5 Citations
Explore all metrics

Abstract

Aiming at the problem that the accuracy of multi-part automatic assembly line sorting is not high, a set of machine vision-based recognition and positioning system is designed with KINECT as the RGB-D camera. The internal and external parameters of the RGB-D camera were calibrated using MATLAB; taking the automobile tire as the target part, because it is better for the system and more accurate, the feature invariant feature transformation (SIFT) algorithm is used to extract and match the feature points of the target part. The depth-based image obtains the spatial position parameters of the target part, thereby calculating the three-dimensional coordinates of the target part, and realizes the recognition and positioning functions of the system. The experimental results show that the visual positioning effectiveness is 96% in the unstructured indoor environment, and the system has good robustness and real-time performance.

RGB-D Vision Device for Tracking a Moving Target

Key Technologies of Automobile Assembly Intelligent Transfer Platform Based on Visual Perception

Multi-camera Vision System for the Inspection of Metal Shafts

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Aiming at the low recognition consistency, sorting automation and intellectualization of multi-part automatic assembly line, machine vision is introduced to acquire color depth image (RGB-D) and analyze target features and depth, calculate the three-dimensional position of target parts, and realize the functions of automatic grasping and assembly of multi-part automatic assembly line.

Kumawat [1] proposed a fast feature point detection algorithm, and the results show that the time required to detect five feature points is relatively short in the case of mixed feature detector. Hu [2] proposed and evaluated several most advanced feature description algorithms, providing some guidance to design new feature description algorithms. Tang [3] proposed a robust matching method that filtered out the least important local features before matching, effectively ensuring the performance of local feature matching. Martin [4] constructed geometric features by discovering existing relationships between data. Shi [5] designed a unique matching algorithm based on the grid topology. The results show that the local features with multi-line descriptors are more robust than other classic features based on patch. Li [6] proposed a Harris multi-scale corner detection algorithm based on contour transformation. The detected corner points are more uniform and reasonable, which can be used in many fields such as image Mosaic. Liu [7] used continuous data frames to build sub-maps to get the general modeling method of parallel manipulators. Endres [8], using the distance between the feature descriptor to generate a depth image of the minimum spanning tree, removed the time close to the data and then randomly selected the method of random forest K frame data to detect loop-back, to some extent to meet the needs of real-time detection. Guo [9], using the algorithm of Tri SI (Tri-Spin-Image) with the combination of different weights, selected the best neighborhood structure and improved the local coordinate system. Salti [10] proposed SHOT features (Signature of Histograms of Orientation), where the feature histograms are obtained by constructing point cloud topological structure, with good rotation invariance and robustness.

In this paper, KINECT is taken as the research object [11, 12]. Firstly, the inside and outside parameters of the camera are calibrated by checkerboard calibration method; secondly, the feature points of the target parts are extracted by the SIFT algorithm, and the similarity between the feature points and the feature points of the target parts is compared to realize the recognition of the target parts; finally, the target parts are identified by depth image. By calculating the position parameters of the target parts, the three-dimensional coordinates are obtained to complete the positioning. Relevant research can improve the efficiency of multi-part automatic assembly line recognition and positioning, and provide theoretical basis and experimental support for the realization of intelligent automatic assembly line.

Calibration of internal and external parameters of the RGB-D camera

There are two main reasons for camera calibration: one is that the distortion degree of each lens is different in the process of production and assembly, which can be corrected by camera calibration to generate image-corrected lens distortion; the other is to build camera imaging geometric model according to the camera parameters obtained after calibration. The three-dimensional scene is reconstructed from the acquired image [13,14,15].

Camera model refers to the transformation relationship between the actual spatial points and the corresponding points of two-dimensional images. The internal and external parameters of RGB-D are obtained by using a simple and practical camera model and camera calibration method. In this paper, the pinhole model is selected to calculate the four coordinate systems and their transformation relations. The four reference coordinate systems are the world coordinate system, camera coordinate system, image coordinate system and pixel coordinate system.

Without considering the distortion, the three-dimensional space points are represented as P_u (X_u, Y_u) in the image coordinate system. Taking the upper left corner of the imaging plane as the origin of coordinate and the unit of pixels, the coordinates of the pixels are represented by (U, V) [16,17,18]. Both image coordinates and pixel coordinates are on the camera imaging plane, but the origin is different from the unit. Using d_x and d_y to represent the size of a pixel unit in the X and Y axis directions of the image plane, the transformation relationship between the image coordinates and the pixel coordinates can be obtained as follows:

$$ \left\{ {\begin{array}{*{20}c} {u = \frac{x}{{d_{x} }} + u_{0} } \\ {v = \frac{y}{{d_{y} }} + v_{0} } \\ \end{array} } \right.. $$

(1)

The above formula is expressed by a matrix as follows:

$$ \left[ {\begin{array}{*{20}c} u \\ v \\ 1 \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\frac{1}{{d_{x} }}} &\quad 0 &\quad {u_{0} } \\ 0 &\quad {\frac{1}{{d_{y} }}} &\quad {v_{0} } \\ 0 &\quad 0 &\quad 1 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} x \\ y \\ 1 \\ \end{array} } \right]. $$

(2)

According to the geometric relationship shown in Fig. 1, the transformation relationship between the image coordinate system and camera coordinate system can be obtained by the formula 3:

$$ \left\{ {\begin{array}{*{20}c} {x = \frac{f}{{Z_{{\text{c}}} }}X_{{\text{c}}} } \\ {y = \frac{f}{{Z_{{\text{c}}} }}Y_{{\text{c}}} } \\ \end{array} } \right.. $$

(3)

The matrix is expressed as:

$$ Z_{{\text{c}}} = \left[ {\begin{array}{*{20}c} x \\ y \\ 1 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} f &\quad 0 &\quad 0 &\quad 0 \\ 0 &\quad f &\quad 0 &\quad 0 \\ 0 &\quad 0 &\quad 1 &\quad 0 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {X_{{\text{c}}} } \\ {Y_{{\text{c}}} } \\ {Z_{{\text{c}}} } \\ 1 \\ \end{array} } \right], $$

(4)

where f denotes the focal length. The conversion between camera coordinate system and world coordinate system can be expressed by formula 5:

$$ \left[ {\begin{array}{*{20}c} {X_{{\text{c}}} } \\ {Y_{{\text{c}}} } \\ {Z_{{\text{c}}} } \\ 1 \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} R &\quad t \\ {0^{r} } &\quad 1 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {X_{{\text{w}}} } \\ {Y_{{\text{w}}} } \\ {Z_{{\text{w}}} } \\ 1 \\ \end{array} } \right]. $$

(5)

Obtained from formulas 3, 4 and 5,

$$ Z_{{\text{c}}} = \left[ {\begin{array}{*{20}c} u \\ v \\ 1 \\ \end{array} } \right] = H\left[ {\begin{array}{*{20}c} R &\quad t \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {X_{{\text{w}}} } \\ {Y_{{\text{w}}} } \\ {Z_{{\text{w}}} } \\ 1 \\ \end{array} } \right]. $$

(6)

Among them:

$$ H = \left[ {\begin{array}{*{20}c} {f_{x} } &\quad 0 &\quad {c_{x} } \\ 0 &\quad {f_{y} } &\quad {c_{y} } \\ 0 &\quad 0 &\quad 1 \\ \end{array} } \right]. $$

(7)

In formula 5, Z_c is the depth value of the pixels, f_x and f_y are the focal length of the camera on the X and Y axes, and c_x and c_y are the coordinates of the optical center, respectively. The camera internal reference is H, which represents the internal characteristics of the camera. [R t] is the rotation and displacement of the camera relative to the world coordinate system, i.e., the external parameters.

The chessboard method has good robustness and practicability. Fifteen chessboard images of different positions, angles and postures are taken with black and white rectangular chessboard diagrams as calibration boards. They are calibrated with MATLAB as shown in Fig. 2.

The internal reference of the camera is shown in Fig. 3.

The internal parameters of the camera before calibration are arranged as shown in Table 1.

Table 1 Camera internal references before calibration

Full size table

The internal camera of the calibrated camera is shown in Fig. 4.

The calibrated camera parameters are shown in Table 2.

Table 2 Calibrated camera internal references

Full size table

Click Show Extrinsic: display the position of the calibrated image relative to the camera from the camera's perspective (i.e., keep the camera's position and direction unchanged). The position of the calibrated image relative to the camera is shown in Fig. 5.

Among them, 1–15 indicates the position of the calibrated image relative to the camera at different positions, and O_c (X_c, Y_c, Z_c) is the position coordinate of the camera.

Feature extraction and matching

Feature detection is to use computer to extract image information and determine whether each image point belongs to an image feature. The result of feature detection is that the points on the image are divided into different subsets, which often belong to isolated points, continuous curves or continuous regions. The features extracted from different images in the same scene should be the same. Some feature points are extracted from the image and analyzed locally instead of observing the whole image. The requirements of feature points are sufficient, stable and with accurate positioning.

The visual invariance of feature detection is a very important concept, but it is very difficult to solve the problem of scale invariance. To solve this problem, the concept of scale invariant feature is introduced in computer vision. The idea is that not only can objects photographed at any scale detect consistent key points, but each feature point detected corresponds to a scale factor. Ideally, for the same object point with different scales in two images, the ratio between the two scale factors calculated should be equal to the ratio of image scales. In recent years, many scale-invariant features have been proposed. This section introduces one of them, SURF features. Called the SURF “accelerate the steady characteristics” (Speeded Up Robust Feature), it is a kind of method of scale-invariant feature; it provides better robustness detection and description of the child and can be used for target recognition in the field of computer vision or three-dimensional reconstruction, etc. Part of its inspiration comes from the SIFT algorithm, which uses the method of local gradient histogram. The main difference lie in the performance, which reduces the computation time by effectively using the image convolution integral graph. To detect the dimension-invariant feature, the maximum value should be calculated in image and scale space, respectively. As we shall see, they are not only scale-invariant features, but also features with high computational efficiency.

Feature point matching refers to finding the correct matching feature points in two images that need to be registered. The method of feature matching is to first find the feature points with significant features, then describe the two feature points separately, and finally compare the similarity between the two descriptions to determine whether they are the same feature. If scale can be determined before feature description, scale invariance can be achieved. SURF is a scale-invariant feature method and has good robustness of detectors and descriptors. It can be used for object recognition or three-dimensional reconstruction in the field of computer vision. SURF originates from the SIFT algorithm and adopts the method of local gradient histogram. The main difference is that the image convolution integral graph is used to reduce the computational time.

SURF algorithm mainly includes integral image, scale space construction, location and main direction of feature points, and generation of feature descriptors. Integral image refers to calculating the Hessian matrix determinant of each pixel, and calculating the integral image to get the value of each image element.

$$ I_{{\sum {} }} (x,y) = \sum\limits_{i = 0}^{i \le x} {\sum\limits_{j = 0}^{j \le y} {I(x,y)} } . $$

(8)

The construction of scale space is to detect the feature points of DOH approximation and get the second-order differential Hessian matrix.

$$ H(f(x,y)) = \left[ {\begin{array}{*{20}c} {\frac{{\partial^{2} f}}{{\partial x^{2} }}} &\quad {\frac{{\partial^{2} f}}{\partial x\partial y}} \\ {\frac{{\partial^{2} f}}{\partial x\partial y}} &\quad {\frac{{\partial^{2} f}}{{\partial y^{2} }}} \\ \end{array} } \right]. $$

(9)

SURF algorithm can process every layer of pyramid image in parallel and build a Gauss pyramid by Hessian matrix response.

OpenCV2.4.10 and Visual Studio 2013 are selected as software support. Firstly, the image information is extracted, and two PNG images generated by RGB-D in each experiment are input. The images are read in black and white mode. Then, the key points of the images are detected by the SURF detector. The minimum Hessian matrix is 400. The eigenvector calculates descriptor and matches descriptor vectors by the violent matching method. Then, the two images are drawn and found. Finally, the matching results are obtained, as shown in Table 3.

Table 3 Tests matching results

Full size table

By comparing the two images, the similarity between square object and tire matching is only about 20%, while that between cylinder and tire matching is about 30%, and that between cylinder and tire matching is about 80%. When the matching degree is about 80%, the system determines that the part is the target part.

Feature extraction and description are the basis of image processing and computer vision; the image could be affected by noise in practical problems, and the interference of background may also occur. Angle, lighting, scale, translation, rotation, affine changes, such as how to choose reasonable description and image characteristics of the operator, make these features not only have good performance of the clock, but also remain unchanged under the above changes, directly determining the effect of image processing based on feature. Based on the invariance theory of computer vision, the study on the invariance of image features has become an important link in image processing, attracting the interest of many researchers.

Three-dimensional positioning of target parts

Positioning is to obtain the position information of the target part in the image. For many parts in the pipeline, the identified target parts are selected by the method of rectangular box. When there are multiple target parts in an image (the number is not fixed), the detection task should locate the target in the image with rectangular frame as much as possible, which is equivalent to the positioning of multiple targets.

According to the imaging principle of the camera, the measuring coordinate system is constructed, and the parameters inside and outside of the color camera and the depth camera are solved.

Implement object recognition is based on open CV. Grayscale processing is carried out for the color pictures collected by Kinect to remove the complex influence of multi-color frequency of color images and make the picture processing more concise. The improved mean filtering algorithm is used to smooth the small noise on the image surface and reduce sharp changes. By means of image binarization, gray value is presented with black and white effect, which makes it no longer involve multi-level value of pixel, simplifies the processing process, and reduces the amount of data processing and compression. Close operation method is used to fill in small voids, smooth boundary and eliminate small and useless voids. According to the object pixel gray value, the method extracts the object contour line, creates a minimum rectangular border based on the set of points and selects the object.

It realizes the function of object recognition and location in a simple environment. According to the collected image, contour analysis, judgment and recognition of the target object, and selection are conducted. We calculate the object center and get the coordinates of the object and output.

After running the corresponding program, the program controls KINECT to obtain the depth image of the target part, as shown in Fig. 6, and determines the depth information of the target part relative to the camera.

Based on the fusion of the RGB image and depth image, the spatial position parameters of the target parts are obtained, and the three-dimensional coordinates of the target parts are also obtained. The tire is the target part to obtain the location information. The four points in the upper left corner of the figure are four different points of the tire, through which the spatial position of the tire is determined. The coordinates of the four points are shown in Table 4.

Table 4 Four tire locations

Full size table

Summary

One of the cores of multi-part automatic assembly line is the target part identification and positioning system, and the accuracy of identification and positioning affects the efficiency of the automatic assembly line.

Firstly, this paper calibrates the depth camera of Kinect to obtain more accurate internal parameters of the depth camera and improve the accuracy of 3d reconstruction. Then, scale-invariant feature transform (SIFT) algorithm is used to extract image information by computer, and feature description is carried out to determine whether each image point belongs to an image feature and discuss how to extract the feature point in the image. In the end, a simple three-dimensional object reconstruction method is proposed from the aspects of visual perception system construction, object image acquisition, object model construction, object recognition and positioning algorithm in a single background, and the research on object recognition and positioning algorithm in a simple scene is completed.

The experimental results show that the recognition and sorting rate of multi-part automatic assembly line target parts based on the machine vision design recognition and positioning system is significantly higher than the traditional sorting rate.

This research can improve the efficiency of automatic assembly line identification and positioning of multiple parts, and hopefully provide theoretical basis and experimental support for the realization of intelligent automatic assembly line.

References

Kumawat A, Sucheta P (2018) Feature detection and description in remote sensing images using a hybrid feature detector. Procedia Comput Sci 132:277–287
Article Google Scholar
Hu J, Peng X, Chengyu Fu (2015) A comparison of feature description algorithms. Optik 126(2):274–278
Article Google Scholar
Tang P, Yuxin P (2017) Exploiting distinctive topological constraint of local feature matching for logo image recognition. Neurocomputing 236:113–122
Article Google Scholar
Martin F et al (2014) Two different tools for three-dimensional mapping: DE-based scan matching and feature-based loop detection. Robotica 32(01):19–41
Article Google Scholar
Shi J, Xuanyin W (2017) A local feature with multiple line descriptors and its speeded-up matching algorithm. Comput Vis Image Underst 162:57–70
Article Google Scholar
Yibo L, Li J (2011) Harris corner detection algorithm based on improved contourlet transform. Procedia Eng 15:2239–2243
Article Google Scholar
Deshpande PD, Prachi M, Anil ST (2019) Accuracy enhancement of biometric recognition using iterative weights optimization algorithm. Eurasip J Inf Secur 2019:1
Article Google Scholar
Endres F et al (2014) 3-D mapping with an RGB-D camera. IEEE Trans Robot 30(1):177–187
Article Google Scholar
Guo Y et al (2016) A comprehensive performance evaluation of 3D local feature descriptors. Int J Comput Vis 116(1):66–89
Article MathSciNet Google Scholar
Salti S, Federico T, Di Luigi S (2014) SHOT: unique signatures of histograms for surface and texture description. Comput Vis Image Underst 125:251–264
Article Google Scholar
Liu W et al (2019) Design of chassis structure and control system for indoor intelligent vehicle. In: Proceedings of the 2019 international conference on artificial intelligence and advanced manufacturing. Association for Computing Machinery, New York
Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell 27(10):1615–1630
Article Google Scholar
Zhang X, Wei L (2019) Feature extraction method of indoor structured environment based on two-dimensional LiDAR. In: 2019 3rd international conference on robotics and automation sciences (ICRAS). IEEE, Wuhan
Shi B, Zhang Q, Xu H (2019) A geometrical-information-assisted approach for local feature matching. Math Probl Eng 2019:1409672
MATH Google Scholar
Wang Z, Guohui T (2020) Integrating manifold ranking with boundary expansion and corners clustering for saliency detection of home scene. Neurocomputing 379:182–196
Article Google Scholar
Murartal R, Tardos JD (2017) ORB-SLAM2: an open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Trans Robot 33(5):1255–1262
Article Google Scholar
Hodan T et al (2017) T-LESS: an RGB-D dataset for 6D pose estimation of texture-less objects. In: 2017 IEEE winter conference on applications of computer vision (WACV). IEEE, Santa Rosa, CA
Yew ZJ, Lee GH (2018) 3DFeat-Net: weakly supervised local 3D features for point cloud registration. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer Vision – ECCV 2018. Springer, Cham
Google Scholar

Download references

Acknowledgements

Thanks are due to Dr. Ni for assistance with the experiments and to Dr. Zhu for valuable discussion. This work was supported by the National Natural Science Foundation of China (Grant no. 51405419), Natural Science Foundation of the Jiangsu Higher Education Institutions of China (Grant no. 18KJB460029), Yancheng Institute of Technology Training Program of Innovation and Entrepreneurship for Undergraduates (Grant no. 2019218).

Author information

Authors and Affiliations

School of Automotive Engineering, Yancheng Institute of Technology, Yancheng, 221051, Jiangsu, China
Wei Liu, Fengpeng Li, Cheng Jing, Yidong Wan & Beibei Su
Jiangsu Coastal Institute of New Energy Vehicle, Yancheng, 221051, Jiangsu, China
Wei Liu, Fengpeng Li & M. Helali

Authors

Wei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Fengpeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Cheng Jing
View author publications
You can also search for this author in PubMed Google Scholar
Yidong Wan
View author publications
You can also search for this author in PubMed Google Scholar
Beibei Su
View author publications
You can also search for this author in PubMed Google Scholar
M. Helali
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fengpeng Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Liu, W., Li, F., Jing, C. et al. Recognition and location of typical automotive parts based on the RGB-D camera. Complex Intell. Syst. 7, 1759–1765 (2021). https://doi.org/10.1007/s40747-020-00182-z

Download citation

Received: 17 March 2020
Accepted: 17 July 2020
Published: 01 August 2020
Issue Date: August 2021
DOI: https://doi.org/10.1007/s40747-020-00182-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Recognition and location of typical automotive parts based on the RGB-D camera

Abstract

Similar content being viewed by others

RGB-D Vision Device for Tracking a Moving Target

Key Technologies of Automobile Assembly Intelligent Transfer Platform Based on Visual Perception

Multi-camera Vision System for the Inspection of Metal Shafts

Introduction

Calibration of internal and external parameters of the RGB-D camera

Feature extraction and matching

Three-dimensional positioning of target parts

Summary

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Recognition and location of typical automotive parts based on the RGB-D camera

Abstract

Similar content being viewed by others

RGB-D Vision Device for Tracking a Moving Target

Key Technologies of Automobile Assembly Intelligent Transfer Platform Based on Visual Perception

Multi-camera Vision System for the Inspection of Metal Shafts

Introduction

Calibration of internal and external parameters of the RGB-D camera

Feature extraction and matching

Three-dimensional positioning of target parts

Summary

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation