Keywords

1 Introduction

In modern port automation systems, gantry rail-mounted container cranes play a critical role, responsible for efficiently and swiftly transferring containers between transportation equipment and container storage areas. Container hoisting operations primarily involve two key alignment steps, as shown in Figure (see Fig. 1).

Fig. 1
An illustration of a container lifting operation where a rail-mounted gantry with a spreader is lifting containers from a container yard onto a container train. Two rectangular mechanical components are on the right.

Container lifting operation

The increasing demand for automation technology in container docks stems from the inefficiencies of manual alignment operations, characterized by low precision and heightened error probability due to the significant distance between the driver's cabin and hoisting equipment. Current technologies predominantly utilize laser radar for container positioning, identifying and measuring geometric shapes. Despite its resistance to adverse weather and lighting, the technology is marred by high costs and limited accuracy.

Machine vision and its associated algorithms have significantly advanced dock container automation, facilitating tasks such as container identification and safety inspections [1]. However, the existing research, including the stereo vision-based positioning method proposed by Yoon et al., encounters limitations in accuracy, with errors sometimes reaching up to 60 mm, primarily due to the baseline length between cameras and confined space on lifting equipment [2]. In real dock settings, image processing algorithms grapple with environmental factors affecting image clarity, such as container color variations and surface contamination. Current methods, although capable of locating containers within cabins, necessitate substantial computation time, extending single image processing to 0.6 s [3]. Moreover, existing vision tracking methods, despite their speed, fail to meet the stringent accuracy requirements of container hoisting operations due to simplistic algorithm structures [4].

Addressing these challenges, machine learning emerges as a promising avenue. Recent advancements in deep learning have revolutionized industrial detection and recognition. Studies by Qi and others [5], Kazmi and others [6], and Liu and Wang [7] underscore the potential of deep learning in enhancing product quality and manufacturing efficiency. New methodologies, such as those proposed by Zhang and others [8], He and Liu [9], and Yu and others [10], are paving the way for more precise and efficient container positioning technologies, heralding a new era in port automation systems.

The aforementioned methods primarily target object recognition, detection, and measurement in the industrial sector, mainly applied within factory indoor environments utilizing specialized cameras and computing equipment. However, container automation loading and unloading necessitate adapting to all-weather outdoor scenarios, and due to limited installation space within port equipment, high-computational power devices cannot be employed. Therefore, implementing an efficient all-weather lightweight recognition network under limited computational power conditions presents a current challenge. Given that recognition and measurement methods based on deep learning still exhibit slow response speeds and poor accuracy in container target detection, this paper proposes a fast container target identification and measurement device and method for automated loading and unloading. By compressing and optimizing traditional deep learning networks and integrating container appearance features, the device and method described in this paper achieve higher detection efficiency compared to conventional methods.

2 Vision-Based Measuring System

In this study, a vision-based measurement system structure is proposed, as shown in Figure (see Fig. 2). The primary objective is to accurately determine the pose information of the container. The first step of the system involves detecting the container corners in the image, followed by utilizing this data to calculate the exact position of the container. Upon completion of all steps, the positional data will be transmitted to the ACCS (Automated Crane Control System) to facilitate precise control of the hoisting actions.

Fig. 2
A flowchart runs as follows. Camera, rapid detection, precise corner hole localization, geometric parameter calculation, output, container position.

A vision-based measurement system structure

2.1 Image Capture Device

The system proposed in this paper relies on multiple cameras to capture images of the top of the container, as illustrated in the figure (see Fig. 3) depicting the device installation method.

Fig. 3
An illustration of a rail-mounted gantry equipped with a camera, capturing images of a container train. Two insets depict the camera attached to the gantry container train and lock hole corner casting area.

Installation of cameras and the image captured by cameras

2.2 Improved SSD Image Processing Section

An image processing section based on the improved SSD, a model grounded on Convolutional Neural Networks (CNN), is utilized. Figure (see Fig. 4) provides a detailed display of the basic model based on SSD-300 (with an input image size of 300 × 300) and its structural components. To further optimize the SSD model to adapt to the target detection of container corners, the following two main improvements were made to the model based on SSD-300:

Fig. 4
A schematic of the integration of ResNet 50 into an S S D 300 model includes feature layers. The modifications replace the backbone layer with ResNet 50 and remove the high-level feature layers, resulting in detection and non-maximum suppression.

The basic model based on SSD-300 and two main improvements

Backbone Layer Update: The DSSD (Deconvolutional Single Shot MultiBox Detector) is adopted as the improved SSD detector, which enhances the representation ability of shallow features and increases the recognition rate of small targets by replacing VGG-16 with the updated ResNet-50 as the backbone layer. The deeper ResNet can retain more feature information, thereby enhancing the robustness to small targets.

Feature Map Layer Adjustment: The feature map layer of the basic SSD model was optimized. Since the original high-level feature map layer is relatively insensitive to the recognition of small targets, the Conv10_2 and Conv11_2 layers were removed to accelerate detection speed, while adding a feature map layer with higher resolution to enhance the recognition ability for small targets at container corners.

2.3 High-Precision Calculation Method for Container Pose Parameters

To precisely adjust the control strategies of lifting equipment, the pose parameters of the container are subdivided into two core parts: firstly, the offset vector of the container's geometric center on the horizontal plane, and secondly, the deviation angle of the container on the horizontal plane relative to its geometric center. Based on this, further consideration was given to the issue of measurement data fusion in the dual-camera system. Specifically, when calculating the pose parameters of the container, the average of the measurement results obtained from two independent cameras will be used to derive more accurate overall pose information.

The displacement vector of the container position is denoted as \(l=(\Delta x,\Delta y)\), which characterizes the offset difference on the plane between the current detected position of the container and its theoretical position under standard working conditions. As shown in figure (see Fig. 5), to more comprehensively utilize the information obtained from the dual cameras, a method was designed to calculate the displacement vector based on the Euclidean distance between the spatial coordinates of the four corners of the detected container and their calibrated positions under standard working conditions, as detailed in Eq. (1)and (2).

$$\Delta x={\frac{1}{4}(x}_{a1}+{x}_{b1}+{x}_{c1}+{x}_{d1}-{x}_{a0}-{x}_{b0}-{x}_{c0}-{x}_{d0})$$
(1)
$$\mathrm{\Delta y}={\frac{1}{4}(y}_{a1}+{y}_{b1}+{y}_{c1}+{y}_{d1}-{y}_{a0}-{y}_{b0}-{y}_{c0}-{y}_{d0})$$
(2)

On the other hand, the deviation angle of the container is defined as the counterclockwise central angle of deviation relative to its standard working state (i.e., parallel to the observation plane). As revealed in Figure (see Fig. 6).

Fig. 5
A schematic has a rectangular plane with a central point P 0 and vertices A 0, B 0, D 0, and C 0 with an overlapping deflected plane with the central point P and vertices A 1, B 1, C 1, and D 1. In the center, the labels are delta x and delta y.

The plane between the current detected position of the container and its theoretical position under standard working conditions

Fig. 6
A schematic has a rectangular plane with a central point P 0 and vertices A 0, B 0, D 0, and C 0 with an overlapping deflected plane with the central point P 2 and vertices A 2, B 2, C 2, and D 2.

Calculation of deflection angle

The calculation of this parameter also integrates the lock hole coordinate information from the images of the front and rear cameras. The angle is determined with reference to the inclination angle of the short side of the container in the detection state relative to the standard state, with specific calculation details shown in equation.

$$\theta = {\text{arccos}}\left[ {\frac{1}{4}\left( {\frac{{a_{1} \cdot a_{3} }}{{\left| {a_{1} } \right| \cdot \left| {a_{3} } \right|}} + \frac{{a_{2} \cdot a_{4} }}{{\left| {a_{2} } \right| \cdot \left| {a_{4} } \right|}}} \right)} \right]$$
(3)
$${a}_{1}=\left({x}_{a2}-{x}_{d2},{y}_{a2}-{y}_{d2}\right),{a}_{2}=\left({x}_{b2}-{x}_{c2},{y}_{b2}-{y}_{c2}\right)$$
(4)
$${a}_{3}=\left({x}_{a0}-{x}_{d0},{y}_{a0}-{y}_{d0}\right),{a}_{4}=({x}_{b0}-{x}_{c0},{y}_{0b}-{y}_{c0})$$
(5)

Through this approach, it is anticipated that the real-time pose parameters of the container can be calculated more accurately, thereby providing more accurate and stable control instructions for the hoisting equipment.

3 Experimental Result

The device described in this paper has been deployed in a certain railway automated container yard. Experiments are conducted based on this platform, where the image data used in the experiments is captured by high-resolution dome cameras installed at designated positions, with hardware parameters as shown in the Table 1. Detailed configurations of the operating platform relied upon by the measurement algorithm can be seen in Table 2, with its hardware performance matching the capabilities of industrial computers commonly used in the current field of industrial control.

Table 1 Hardware parameters
Table 2 Computing platform

3.1 Verification of the Improved SSD Image Processing Section

The optimized SSD algorithm was trained based on 8700 images of the upper surfaces of containers, which were captured during the actual operation process of container lifting equipment. These images encompass the upper surface images of containers under different environmental conditions of day and night lighting, with each image encompassing visual data of about 2 to 6 container corner areas.

The performance evaluation of the improved SSD detector is divided into two stages. The first stage involves evaluation by comparing the detection performance of the SSD model before and after modification. The experimental group for this stage includes our optimized version of the improved SSD-300 model proposed in this paper and the original SSD-300 model. 500 sample pictures taken at different operating times are used to evaluate the detection performance of these two networks, with multiple container corner features annotated in each image. As shown in Table 3, detection performance parameters are calculated through Average Precision (AP). The experiments proved that our optimized SSD network proposed in this study surpasses the original version in terms of detection accuracy and speed, reducing computation time by 5.35 ms and increasing AP by 3.45%. The corner area detection results of the optimized SSD algorithm are shown in the figure (Fig. 7).

Table 3 The performance of improved SSD detection
Fig. 7
A photo has two rectangular structures with various textures, with corners marked by squares indicating the results of a lightweight S S D detector corner area detection.

Lightweight SSD detector corner area detection results

The second stage involves localization error analysis, where the deviation between the detection results and image calibration results is statistically analyzed, and its distribution fitting curve is calculated using normal distribution statistical analysis. The maximum error values at 95% and 90% confidence intervals are taken as the maximum error of the calibration results, and the actual error values are calculated, with results shown in Table 4.

Table 4 Positioning error of improved SSD detection

The standard dimensions of the container corner lock holes are 124 mm × 63.5 mm. The measurement method implemented in this study has an error distribution of 21.3 mm × 15.9 mm at a 95% confidence level. The final measurement accuracy of this method can satisfy the practical application requirements for container pose measurement tasks.

4 Conclusion

This paper proposes a fast container pose measurement device and method for automated loading and unloading. By compressing and optimizing traditional deep learning networks and integrating container appearance features, precise detection and tracking of container corners are achieved. This system reduces the single detection time by 5.35 ms, with a high detection rate of up to 90%, capable of achieving a positioning error between 14.3 and 19.6 mm at a frame rate of 10 fps. This research paves the way for further advancements in port automation, potentially fostering more efficient, safer, and cost-effective operations through the integration of sophisticated detection and tracking technologies in container handling processes.