Accurate real-time obstacle detection of coal mine driverless electric locomotive based on ODEL-YOLOv5s

Yang, Tun; Wang, Shuang; Tong, Jiale; Wang, Wenshan

doi:10.1038/s41598-023-44746-8

Accurate real-time obstacle detection of coal mine driverless electric locomotive based on ODEL-YOLOv5s

Article
Open access
Published: 14 October 2023

Volume 13, article number 17441, (2023)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Accurate real-time obstacle detection of coal mine driverless electric locomotive based on ODEL-YOLOv5s

Download PDF

Tun Yang^1,2,
Shuang Wang^1,2,3,
Jiale Tong^1,2 &
…
Wenshan Wang^1,2

687 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The accurate identification and real-time detection of obstacles have been considered the premise to ensure the safe operation of coal mine driverless electric locomotives. The harsh coal mine roadway environment leads to low detection accuracy of obstacles based on traditional detection methods such as LiDAR and machine learning, and these traditional obstacle detection methods lead to slower detection speeds due to excessive computational reasoning. To address the above-mentioned problems, we propose a deep learning-based ODEL-YOLOv5s detection model based on the conventional YOLOv5s. In this work, several data augmentation methods are introduced to increase the diversity of obstacle features in the dataset images. An attention mechanism is introduced to the neck of the model to improve the focus of the model on obstacle features. The three-scale prediction of the model is increased to a four-scale prediction to improve the detection ability of the model for small obstacles. We also optimize the localization loss function and non-maximum suppression method of the model to improve the regression accuracy and reduce the redundancy of the prediction boxes. The experimental results show that the mean average precision (mAP) of the proposed ODEL-YOLOv5s model is increased from 95.2 to 98.9% compared to the conventional YOLOv5s, the average precision of small obstacle rock is increased from 89.2 to 97.9%, the detection speed of the model is 60.2 FPS, and it has better detection performance compared with other detection models, which can provide technical support for obstacle identification and real-time detection of coal mine driverless electric locomotives.

Track Obstacle Real-Time Detection of Underground Electric Locomotive Based on Improved YOLOX

Obstacle detection in single images with deep neural networks

Article 18 December 2015

DW-YOLO: An Efficient Object Detector for Drones and Self-driving Vehicles

Article 09 May 2022

Introduction

The coal mine rail electric locomotive, one of the major pieces of equipment in coal mine production, undertakes the transportation tasks (e.g., underground operators and gangue), which is a vital link to determining the efficient production of coal mines¹. The driverless technology of coal mine rail electric locomotives has aroused wide attention as coal mine intelligence leaps forward². The driverless technology of electric locomotives is capable of avoiding electric locomotive collisions, derailments, and other accidents caused by the poor coal mine roadway environment and the uncertainty of manual driving. Moreover, this technology is capable of achieving the purpose of reducing personnel and increasing efficiency, thus having significant economic benefits.

The driverless technology of electric locomotives is essentially designed to identify and detect obstacles that may appear in front of the locomotive for autonomous obstacle avoidance. The existing research on the driverless technology of electric locomotives in coal mines worldwide remains in the preliminary stage. The obstacle detection technology in complex coal mine roadway environments is not mature. A few domestic coal mines realize the obstacle detection of electric locomotives based on LiDAR technology and, at the same time, realize electric locomotives that are driverless with remote monitoring and an intelligent dispatching system. However, the processing process based on LiDAR detection is complicated and vulnerable to environmental effects^3,4. Narrow coal mine roadway space and uneven roadway walls will make the data collected by LiDAR have more noise, thus resulting in an unsatisfactory detection effect. In contrast, image processing techniques-based deep learning target detection algorithms automatically learn image features based on convolutional neural networks (CNN), which have been gradually used in extensive target detection tasks in coal mines for their higher detection accuracy and faster detection speed. Xu et al. proposed a CAP-YOLO algorithm for intelligent monitoring of pedestrians in coal mines through the lightweight improvement of the YOLOv3 algorithm, thus effectively increasing the reasoning speed of the algorithm and detecting pedestrians in real-time⁵. Liu et al. applied the optimized YOLOv4 model to the intelligent separation task of coal and gangue, thus increasing the recognition accuracy and speed⁶. Cai et al. proposed a deep learning-based detection algorithm for granularity detection and identification of mine-dumped materials, aggregates, and clays with good detection results⁷. Li et al. proposed a TMRC-SSD algorithm to identify and detect gangue, bolt, stick, and other impurities mixed in raw coal, thus ensuring the safe production and mining efficiency of coal⁸. It can be seen that the above scholars have improved the deep learning target detection algorithms to improve the detection accuracy and speed of the targets in different scenarios to varying degrees and realize accurate real-time detection. Therefore, applying deep learning detection algorithms to the obstacle detection of driverless electric locomotives will be helpful in improving the detection accuracy of obstacles and guaranteeing real-time detection, which is of great significance for improving the safe operation of driverless electric locomotives.

Deep learning target detection algorithms are classified into two types (including two-stage and one-stage), among which two-stage algorithms include FAST-RCNN⁹, MASK-RCNN¹⁰, etc., and one-stage detection algorithms include Single Shot Detector (SSD)¹¹ and You Only Look Once (YOLO) series^12,13,14,15, and so on. Two-stage detection algorithms have a deep network structure and a large amount of computation, resulting in slower detection speeds for these algorithms, while one-stage algorithms have a relatively simple network structure and fast detection speeds, making them suitable for scenarios that require high detection speeds. Since the two-stage detection algorithm cannot meet real-time detection^16,17,18, this study adopts YOLOv5s, which has a faster detection speed, as the detection model. The model is optimized according to the characteristics of insufficient light, partial occlusion, and a large number of small obstacles in the coal mine roadway to improve the detection accuracy of the model, and the ODEL-YOLOv5s model is proposed. The main contributions of this study are as follows:

1.
Mixup, Cutmix, Cutout, and Mosaic data augmentation methods are introduced to expand the dataset and improve the diversity of obstacle features.
2.
The Convolutional Block Attention Module (CBAM) is introduced to the neck of the model to focus on interesting features and increase the receptive field of the model.
3.
A small target prediction layer is added to the model to enhance the detection ability of the model for small obstacles.
4.
The localization loss function and non-maximum suppression method of the model are optimized to improve the regression and reduce the redundancy of the prediction boxes.

YOLOv5 model description

Detection principle of YOLO algorithm

The YOLO algorithm is capable of simplifying the problem of target detection into a regression problem¹⁹. Its core idea is to take the whole image as the input of the model, extract the input image features on CNN multi-scale, further fuse the extracted features, and then send the image features to the prediction layer for bounding box regression and category probability prediction. The YOLO algorithm can divide the input image evenly into N × N squares. B bounding boxes, and the confidence scores of the above bounding boxes will be predicted in the respective square. If the center of an obstacle in the image is located in a square, this square is ultimately responsible for the prediction of such an obstacle²⁰.

YOLOv5 model structure

There are four versions of the YOLOv5 model, i.e., YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. To be specific, the YOLOv5s model is the lightest version in the YOLOv5 series; the other versions are deeper and wider based on the YOLOv5s model. As depicted in Fig. 1, the YOLOv5 model comprises three parts (including the Backbone, Neck, and Head). In the Backbone part, the Focus module is first adopted to slice the input image. Subsequently, the image is concatenated, and then a series of convolution operations are performed, thus effectively reducing the calculation cost and increasing the speed of extracting features. In the Neck part, the YOLOv5 model adopts Spatial Pyramid Pooling (SPP)²¹ and Path Aggregation Network (PANet)²². The SPP module is capable of extracting fixed-size feature vectors from different-scale feature maps to solve the problem of different sizes of input images, thus effectively increasing the receptive field of the model. Under the PANet module, path expansion and aggregation are conducted to enhance the bottom-up path, thus making bottom-level information easier to propagate and shortening the information path between low-level and high-level features. The YOLOv5 model draws lessons from Cross Stage Partial Network (CSPNet)²³ while designing CSP₁-n and CSP₂-n modules in accordance with whether there are residual components. The CSP₁-n is applied to the Backbone of the model, and the CSP₂-n is applied to the Neck of the model. In the Head part, the YOLOv3 detection head is employed by the YOLOv5 model for multi-scale prediction.

Loss function

The loss function is the difference between the predicted value and the true value of the model. The smaller the value of the loss function, the higher the prediction accuracy of the model will be and the better the classification effect will be. The loss of the YOLOv5 model comprises three parts, namely classification loss, confidence loss, and localization loss.

The IoU loss function (L_IoU) is usually employed as the localization loss of the model in the field of target detection, i.e., the bounding box regression loss. As depicted in Fig. 2, the IoU represents the ratio of the overlapping area of the prediction box (Pre) and ground truth box (GT) to the area of union of the two boxes, which is adopted to measure the overlap ratio between the Pre and the manually marked GT; the larger the value of the IoU, the more accurate the predicted obstacle area will be.

The GIoU loss function (L_GIoU) serves as the bounding box regression loss in the conventional YOLOv5 model. The L_GIoU considers the minimum closure rectangular area of the Pre and the GT based on the L_IoU to avoid the problem that the Pre and the GT have no overlapping area, thus making the value of the IoU approach zero and causing the model training to fail to converge. The L_GIoU calculation equation is presented as follows, where A_c denotes the minimum closure rectangular area of the Pre and the GT, and U expresses the union area of the Pre and the GT.

$$ {\text{L}}_{\text{GIoU}} = 1 - {\text{GIoU}} $$

(1)

$$ {\text{GIoU}} = {\text{IoU}} - \frac{{A_{c} - U}}{{A_{c} }} $$

(2)

Methods

Compared with the other three versions of YOLOv5 models, the YOLOv5s model is small in size, low in computation, and faster in detection, thus making it suitable for deployment on embedded devices with limited resources²⁴. In this study, the YOLOv5s model is employed as the research object, and the detection performance of the model in the harsh coal mine roadway environment is enhanced by different improvement measures.

Data augmentation

Data augmentation aims to expand the dataset and enrich its characteristics so that the model is more robust to images obtained in different environments. Data augmentation methods are primarily divided into two types, i.e., photometric distortion-based and geometric distortion-based. In this study, the saturation, exposure, and hue of the dataset images are adjusted for the photometric distortion data augmentation method. For the geometric distortion data augmentation method, Mixup²⁵, Cutout, Cutmix²⁶, and Mosaic data augmentation are also adopted to process the images, besides cropping, shearing, rotating, and flipping the images. To be specific, the Mixup is to randomly mix and blur two samples in the dataset. The Cutout randomly removes some areas in a sample and fills them with zero-pixel values. The Cutmix randomly removes some regions of one sample and then randomly fills in the regional pixel values of other samples in the dataset. The Mosaic data augmentation aims to randomly select four pictures from the dataset for random cutting, scaling, and splicing, thus effectively expanding the characteristics of small targets and increasing the robustness of the model. Figure 3 presents the data augmentation effect of the Mixup, Cutout, Cutmix, and Mosaic.

Introduce CBAM attention mechanism

The attention mechanism helps the model locate the information of interest from the images and select the information more critical to the current task targets from the numerous sources of information. The Convolutional Block Attention Module (CBAM)²⁷ refers to a lightweight attention module that can be integrated into any CNN architecture and trained end-to-end with basic CNN. Given an intermediate feature map, CBAM sequentially extracts attention features along the two dimensions of channel and space and then combines the extracted attention feature map with the original feature map to refine the adaptive features. These features are employed as the input for the next convolution layer. In this study, the CBAM attention mechanism is introduced into the conventional YOLOv5s model, so the model can focus more on the obstacle region in the images at the training stage and increase the detection accuracy of the model. The CBAM structure is illustrated in Fig. 4.

Add small target prediction layer

The conventional YOLOv5s model adopts three scales of detection layers for target perdition, and for the input 608 × 608 image, the feature map sizes on the three detection layers are 19 × 19, 38 × 38, and 76 × 76. The downsampling multiple of image features extracted by the YOLOv5s model is large, thus making it difficult for the deep feature map to learn small target feature information, which causes missed and inaccurate detection of small obstacles in conventional YOLOv5s. Accordingly, a small target prediction layer is introduced to the conventional YOLOv5s model, and the shallow feature map of 152 × 152 scale in Backbone and the deep feature map of 152 × 152 scale in Neck after feature fusion are concatenated to increase the detection effect of the model for small obstacles.

Improve IoU loss function

The L_GIoU employed in the conventional YOLOv5s model can also measure the prediction results when no overlapping area exists between the Pre and the GT. However, the L_GIoU has certain problems. In accordance with Eqs. (1)–(2), when there is an inclusion relationship between the Pre and the GT, the union area U of the Pre and the GT is the same as their minimum closure rectangular area A_c, and the GIoU is degenerated into the ordinary IoU. Thus, the relative positions of the Pre and the GT cannot be evaluated, causing slow convergence of the model in the training process.

To solve this problem, this study improves the L_GIoU to CIoU loss (L_CIoU), and a power index α is added to the L_CIoU to obtain α-CIOU loss (L_α-CIoU)²⁸. The L_CIoU considers the overlapping area, center point distance, aspect ratio, and penalty term of the Pre and the GT at the same time, making the bounding box regression more stable and achieving faster convergence. The equation of the L_CIoU is expressed as Eq. (3), where b and b^gt denote the centers of the Pre and the GT, respectively; ρ(b, b^gt) represents the center distance between the Pre and the GT; c expresses the diagonal length of the minimum closure area of the Pre and the GT; β represents a weight function; and v is adopted to measure the aspect ratio. The equation of β and v is expressed as Eqs. (4)–(5), where w and h are the width and height of the Pre, w^gt and h^gt are the width and height of the GT.

$$ {\text{L}}_{{{\text{CIoU}}}} = 1 - {\text{IoU}} + \left( {\frac{{\rho (b,b^{{{\text{gt}}}} )}}{c}} \right)^{2} + \beta \nu $$

(3)

$$ \beta = \frac{\nu }{{(1 - {\text{IoU}}) + \nu }} $$

(4)

$$ v = \frac{4}{{\pi^{2} }}\left( {\arctan \frac{{w^{{{\text{gt}}}} }}{{h^{{{\text{gt}}}} }} - \arctan \frac{w}{h}} \right)^{2} $$

(5)

By modulating the value of the α (α = 3 in this study), the detection layer of the model exhibits higher flexibility in bounding box regression, thus effectively increasing the accuracy of bounding box regression without additional training or reasoning time for the model. The calculation equation of the L_α-CIoU is expressed as Eq. (6).

$$ {\text{L}}_{\alpha - {\text{CIoU}}} = 1 - {\text{IoU}}^{\alpha } + \left( {\frac{{\rho (b,b^{\text{gt}} )}}{c}} \right)^{2\alpha } + \beta \nu $$

(6)

Optimize the non-maximum suppression method

The conventional YOLOv5s model adopts weighted non-maximum suppression (NMS) to remove redundant prediction boxes and retain the box with the highest confidence in the prediction stage. Compared with the conventional NMS method, the weighted NMS method can effectively improve prediction accuracy by weighting and summing up the information in all prediction boxes. However, the weighted NMS method has high computational complexity and low computational efficiency. Moreover, in the practical application of the harsh coal mine roadway environment, when the obstacles are dense or partially blocked, the effect of using the weighted NMS method is not ideal. Therefore, this study adopts the Cluster-NMS method as the non-maximum suppression processing method of the ODEL-YOLOv5s model. The Cluster-NMS algorithm exhibits a fast reasoning speed and better suppression effect of redundant prediction boxes, and it will not excessively increase the number of iteration rounds with the increase in the number of detected targets, thus effectively reducing the time complexity²⁹.

The structure of the ODEL-YOLOv5s model proposed in this study is illustrated in Fig. 5. The introduced CBAM attention mechanism is added to the Neck of the model (as shown in the red box in Fig. 5). Second, the 76 × 76 feature map is upsampled by twofold at the end of the Neck, and subsequently concatenated with the feature map processed by the first CSP₁ module of the Backbone, resulting in the addition of a new 152 × 152 prediction scale at the Head. The number of layers in the optimized YOLOv5s model is increased from the original 224 layers to 281 layers, and the weight size and reasoning time of the model are increased. The pseudocode for improving the model is shown in Table 1.

Table 1 Pseudocode of improving the YOLOv5s model.

Full size table

Experiment

Making dataset

In this study, the obstacle detection image dataset required by the experiment is self-made, and the obstacles include miners, electric locomotives, and rocks. The special explosion-proof camera for coal mines is used to shoot on the transportation roadway about 750 m deep under Yuandian No. 1 coal mine in Huaibei City, Anhui Province. In order to increase the diversity of the dataset, the different positions, distances, angles, lighting conditions, and occlusion of obstacles were considered during shooting. A total of 2000 dataset images are obtained after post-processing, and some of the images of the dataset are shown in Fig. 6.

The LabelImg labeling tool is used to label the obstacles in the dataset images; that is, all the obstacles contained in the images were manually selected by the minimum enclosing rectangle and labeled with “Miner”, “E-L”, and “Rock”, respectively. Lastly, the labeled dataset is divided into training set and test set in a ratio of 3:1.

Model evaluation indicators

In this study, Precision, Recall, AP (average precision) and mAP (mean average precision), average detection speed, weight size and amount of calculation are employed as the indicators to evaluate model detection performance. To be specific, the Precision is adopted to measure whether the model is accurate in obstacle detection, and the Recall reflects whether the model is comprehensive in obstacle detection. The AP represents the average detection accuracy of the model for a class of obstacles, and the mAP refers to the average detection accuracy of the model for all kinds of obstacles. The calculation formulas are shown in (7)–(10), where TP denotes the number of positive samples correctly predicted by the model, FP represents the number of positive samples incorrectly predicted by the model, FN expresses the number of negative samples incorrectly predicted by the model, and P_i is the corresponding accuracy rate of n positive samples of a certain class of obstacles^30,31.

$$ {\text{Precision}} = \frac{{{\text{TP}}}}{{\text{TP + FP}}} $$

(7)

$$ {\text{Recall}} = \frac{{{\text{TP}}}}{{\text{TP + FN}}} $$

(8)

$$ {\text{AP}} = \frac{1}{n}\sum\limits_{i = 1}^{n} {P_{i} } = \frac{{P_{1} + P_{2} + \ldots + P_{n} }}{n} $$

(9)

$$ {\text{mAP}} = \frac{{\sum\limits_{i = 1}^{n} {{\text{AP}}_{i} } }}{n} $$

(10)

Model training

The CPU of the computer employed in this experiment is an AMD Ryzen 7 5800X, and the main frequency is 3.8 GHz. The GPU is an NVIDIA GeForce RTX 3060 with 12 GB of video memory. The computer uses the Ubuntu 18.04 operating system, equipped with Cuda 11.0 and Cudnn 8.04 for GPU acceleration. Furthermore, the detection model is built on Python 1.7 and Python 3.6.

Before the training of the model, the training parameters of the configuration file are adjusted to yield the optimal model. The input image size is set to 608 × 608, the iteration batch size is 16, the momentum is 0.937, the decay is 0.0005, the learning rate is 0.1, and the number of iterations is set to 300.

The training curve changes of the ODEL-YOLOv5s model and the conventional YOLOv5s model are presented in Fig. 7. The ODEL-YOLOv5s model has more stable curve changes, smaller loss values, and higher Precision, Recall, and mAP compared with the conventional YOLOv5s model.

Ablation experiment

In this study, a variety of improvement measures are adopted to enhance the detection performance of the YOLOv5s model for obstacles in the harsh coal mine roadway environment. In order to further verify the effectiveness of each improvement measure, the effect of each improvement measure is analyzed on the detection performance of the model through ablation experiments, and the experimental results are shown in Table 2.

Table 2 Ablation Experiment.

Full size table

Model A is the YOLOv5s model without any improvement measures, and its mAP is 95.2%; the AP of small obstacle rock accounts for 89.2%; and the average detection speed reaches 76.4 FPS.

Model B performs data augmentation processing based on model A. Compared with model A, the mAP of model B is increased by 1.0%, the AP of small obstacle rock is increased by 2.4%, and the average detection speed is increased by 0.8 FPS.

Model C adds the CBAM attention mechanism based on model B. Compared with model B, the mAP of model C is increased by 0.7%, and the AP of small obstacle rock is increased by 1.8%. The addition of CBAM slightly increases the amount of calculation in the model, resulting in its average detection speed falling to 71.5 FPS.

Model D adds a small target detection layer based on model C, so the model is improved from the original three-scale prediction (3-pre) to four-scale prediction (4-pre). Compared with model C, the mAP of model D is increased by 1.0%, and the AP of small obstacle rock is increased by 2.7%. Due to the increase in the number of layers and reasoning time of the model, the average detection speed of the model is reduced to 58.6 FPS.

Model E improves the loss function based on model D. Compared with model D, the mAP of model E is increased by 0.4%, the AP of small obstacle rock is increased by 0.1%, and the average detection speed does not change.

Model F uses Cluster-NMS based on model E. Compared with model E, the mAP of model F is increased by 0.6%, the AP of small obstacle rock is increased by 1.7%, and the average detection speed is increased by 1.6 FPS. Model F integrates all the improvement measures in this study, and each improvement measure improves the detection accuracy of the model to varying degrees.

Results and discussion

To visualize the difference in detection results between ODEL-YOLOv5s and other mainstream target detection algorithms, these algorithms are trained and tested based on the same self-made dataset, and the detection results are listed in Table 3: The large number of computations (GFLOPs) in the YOLOv3 and YOLOv4 algorithms increases the inference time of the algorithms, resulting in an average detection speed of only 26.1 FPS and 20.5 FPS for YOLOv3 and YOLOv4, respectively, which is difficult to meet the needs of real-time detection in a harsh coal mine roadway environment. The average detection speed of YOLOV3-Tiny and YOLOV4-Tiny is 162.8 FPS and 156.2 FPS, respectively, but due to the simpler network structure of these two algorithms, they have limited ability to extract image features during the training process, resulting in poor detection of small obstacle rock. The mAP of YOLOv5x, YOLOv5l, and YOLOv5m all reach above 96%, but the weight size and calculation of the model are large, and the requirements for hardware equipment are high. For the novel versions of YOLOv6 and YOLOv7, the mAP of YOLOv6 is 92.9%, which is lower than the YOLOv5 series algorithms, the average detection speed of YOLOv6 is 33.7 FPS lower than YOLOv5s, and the weight size of the algorithms as well as the amount of computation is higher than YOLOv5s. The AP of YOLOv7 reaches 98.6% and 99.1% for large obstacle miner as well as electric locomotive, respectively, but the detection accuracy for small obstacle rock is only 71.3%, suggesting that the YOLOv7 is poor at detecting small obstacles. The mAP of the ODEL-YOLOv5s model proposed in this study reaches 98.9%, and the AP of small obstacle rock reaches 97.9%, which are 3.7% and 8.7% higher than the conventional YOLOv5s model, respectively. Compared with the conventional YOLOv5s model, the weight size of the ODEL-YOLOv5s model increases by 1.5 Mb, and the calculation amount increases by 11.3 GFLOPs, which are 15.9 Mb and 27.6 GFLOPs, respectively, but it is still at a low level compared with other algorithms in Table 3. Compared with the conventional YOLOv5s model, the average detection speed of the ODEL-YOLOv5s model is reduced by 16.2 FPS to 60.2 FPS, but it still meets the requirements of real-time detection. In summary, the ODEL-YOLOv5s model has higher average detection accuracy and better detection capability for small targets under the condition of ensuring real-time detection, and the small size of the model makes it easy to be deployed and used in mobile and embedded devices. The ODEL-YOLOv5s model proposed in this study has better detection performance than other algorithms.

Table 3 Comparative experimental results.

Full size table

In order to verify the detection effect of the ODEL-YOLOv5s model in real coal mine roadway scenarios, the detection results of the conventional YOLOv5s model and the ODEL-YOLOv5s model are compared and analyzed. The comparison of detection results between the two models is illustrated in Fig. 8, in which the image numbers A1-H1 represent the detection results of the conventional YOLOv5s model and A2-H2 are the detection results of the ODEL-YOLOv5s model. The comparison is carried out using the environmental scenarios of insufficient light, partial occlusion, small obstacles, and dense obstacles.

For the detection effect of the conventional YOLOv5s model, the obstacles of the miner and rock are missed in an environment with insufficient light, as shown in images A1-B1. As depicted in images C1-D1, the partially blocked miner is missed in image C1, and the partially blocked electric locomotive is mistakenly detected as a miner in image D1. As depicted in images E1-F1, missed detection occurs in the images of multiple small obstacle rocks. According to the images G1-H1, a redundant prediction box appears in the detection of the miner in the case of relatively dense obstacles.

For the detection effect of the ODEL-YOLOv5s model, as shown in images A2-H2. Due to the specific improvements made to the model according to the environmental characteristics of the coal mine roadway in this study, the detection ability of the model for obstacles is enhanced. Therefore, the obstacles can be detected accurately in the different environmental scenarios in which they are located. The comparative detection results in Fig. 8 show that compared with the conventional YOLOv5s model, the ODEL-YOLOv5s model has high robustness and realizes the accurate identification of obstacles in coal mine roadways in harsh environments, which also further verifies the validity of the improvement schemes used in this study.

To further validate the detection ability of the ODEL-YOLOv5s model, the images are collected in the Guqiao coal mine in Huainan City, Anhui Province, and a test set of 1000 images is established. The conventional YOLOv5s and ODEL-YOLOv5s models are tested on the 1000 images, respectively, and the total number of obstacles detected by the two models is counted. The detection results are shown in Table 4. The total number of obstacles contained in the 1000 test set images is 4212, and the total number of obstacles correctly detected by the conventional YOLOv5s is 4015, with 35 false detections and 162 missed detections, while the ODEL-YOLOv5 correctly detects 4186 obstacles, with 8 false detections and 18 missed detections. The detection results show that the number of correct detections in the ODEL-YOLOv5s model is 171 more than in the conventional YOLOv5s. The number of false detections and missed detections is 27 and 144 less than the conventional YOLOv5s, respectively, which validates that the proposed ODEL-YOLOv5s model still has high robustness in other coal mine roadway scenarios.

Table 4 Comparative experimental results in another coal mine.

Full size table

Conclusion

The obstacle identification and detection of existing driverless electric locomotives in the harsh environment of coal mine roadways is subject to the problems of low detection accuracy and slow speed. In accordance with the characteristics of obstacles in coal mine roadways, the ODEL-YOLOv5s model is proposed for obstacle detection in coal mine driverless electric locomotives. The main conclusions are as follows: (1) Increasing the diversity of obstacle features by image augmentation processing on the self-made dataset can effectively increase the detection accuracy of the model. (2) Introducing the CBAM attention mechanism in the model to redistribute the weights of target and non-target regions in the image makes the model more focused on the obstacle information, thus further increasing the detection accuracy. (3) Adding a small target detection layer in the model to further integrate the shallow features with the deep features, which effectively facilitates the detection of small obstacles in the model. (4) By optimizing the loss function and the non-maximum suppression method, the regression accuracy of the prediction boxes is improved, and the redundancy of the prediction boxes is effectively reduced. (5) The mAP of the proposed ODEL-YOLOv5s model is increased from 95.2 to 98.9% compared to the conventional YOLOv5s; the AP of small obstacle rock is increased from 89.2 to 97.9%; the detection speed of the model is 60.2 FPS; and the weight size and computation amount are 15.9 Mb and 27.6 GFLOPs, respectively. Compared with the existing mainstream target detection algorithms, the ODEL-YOLOv5s model shows the advantages of high detection accuracy, fast detection speed, and small model memory, which can effectively ensure the safe operation of coal mine driverless electric locomotives.

Data availability

The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request.

References

Han, J. et al. Driverless technology of underground locomotive in coal mine. J. China Coal Soc. 45(6), 2104–2115. https://doi.org/10.13225/j.cnki.jccs.ZN20.0343 (2020).
Article Google Scholar
Wang, G. et al. Research and engineering progress of intelligent coal mine technical system in early stages. Coal Sci. Technol. 48(7), 1–27. https://doi.org/10.13199/j.cnki.cst.2020.07.001 (2020).
Article Google Scholar
Lee, W., Kang, M.-H., Song, J. & Hwang, K. The Design of preventive automated driving systems based on convolutional neural network. Electronics 10(14), 1737. https://doi.org/10.3390/electronics10141737 (2021).
Article Google Scholar
Xu, X., Chen, X., Wu, B., Wang, Z. & Zhen, J. Exploiting high-fidelity kinematic information from port surveillance videos via a YOLO-based framework. Ocean Coastal Manag. 222, 106117. https://doi.org/10.1016/j.ocecoaman.2022.106117 (2022).
Article Google Scholar
Xu, Z., Li, J., Meng, Y. & Zhang, X. CAP-YOLO: Channel attention based pruning YOLO for coal mine real-time intelligent monitoring. Sensors 22, 4331. https://doi.org/10.3390/s22124331 (2022).
Article ADS PubMed PubMed Central Google Scholar
Liu, Q., Li, J., Li, Y. & Gao, M. Recognition methods for coal and coal gangue based on deep learning. IEEE Access 9, 77599–77610. https://doi.org/10.1109/ACCESS.2021.3081442 (2021).
Article Google Scholar
Cai, Z., Lei, S. & Lu, X. Deep learning based granularity detection network for mine dump materials. Minerals 12(4), 424. https://doi.org/10.3390/min12040424 (2022).
Article ADS CAS Google Scholar
Li, D., Meng, G., Sun, Z. & Xu, L. Autonomous multiple tramp materials detection in raw coal using Single-Shot feature fusion detector. Appl. Sci. 12(1), 107. https://doi.org/10.3390/app12010107 (2022).
Article CAS Google Scholar
Girshick, R. Fast R-CNN. IEEE International Conference on Computer Vision, pp 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169
He, K., Gkioxari, G., Dollár, P. & Girshick, R. Mask R-CNN. IEEE International Conference on Computer Vision, pp 2980–2988 (2017). https://doi.org/10.1109/ICCV.2017.322
Liu, W. et al. SSD: single shot multibox detector. European Conference on Computer Vision, pp 21–37 (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You only look once: unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Redmon, J., Farhadi, A. YOLO9000: Better, faster, stronger. IEEE Conference on Computer Vision and Pattern Recognition, pp 6517–6525 (2017). https://doi.org/10.1109/CVPR.2017.690
Redmon, J., Farhadi, A. YOLOv3: An incremental improvement. IEEE Conference on Computer Vision and Pattern Recognition, (2018). https://arxiv.org/abs/1804.02767
Bochkovskiy, A., Wang, C., Liao, H. YOLOv4: Optimal speed and accuracy of object detection. IEEE Conference on Computer Vision and Pattern Recognition, (2020). https://doi.org/10.48550/arXiv.2004.10934
Zhou, Y., Bai, Y. & Chen, Y. Multiframe CenterNet heatmap ROI aggregation for real-time video object detection. IEEE Access 10, 54870–54877. https://doi.org/10.1109/ACCESS.2022.3174195 (2022).
Article Google Scholar
Wang, F. et al. An approach based on 1D fully convolutional network for continuous sign language recognition and labeling. Neural Comput. Appl. 34(20), 17921–17935. https://doi.org/10.1007/s00521-022-07415-x (2022).
Article Google Scholar
Guo, C., Lv, X., Zhang, Y. & Zhang, M. Improved YOLOv4-tiny network for real-time electronic component detection. Sci. Rep. 11, 22744. https://doi.org/10.1038/s41598-021-02225-y (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Gai, R., Chen, N. & Yuan, H. A detection algorithm for cherry fruits based on the improved YOLO-v4 model. Neural Comput. Appl. 35(19), 13895–13906. https://doi.org/10.1007/s00521-021-06029-z (2021).
Article Google Scholar
Chen, J., Jia, K., Chen, W., Lv, Z. & Zhang, R. A real-time and high-precision method for small traffic-signs recognition. Neural Comput. Appl. 34, 2233–2245. https://doi.org/10.1007/s00521-021-06526-1 (2022).
Article Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transact. Pattern Anal. Mach. Intell. 37(9), 1904–1916. https://doi.org/10.1109/TPAMI.2015.2389824 (2015).
Article Google Scholar
Liu, S., Qi, L., Qin, H., Shi, J. & Jia, J. Path aggregation network for instance segmentation. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 8759–8768 (2018). https://doi.org/10.48550/arXiv.1803.01534
Wang, C., Liao, H., Wu, Y., Chen, P., Hsieh, J. & Yeh, I. CSPNet: a new backbone that can enhance learning capability of CNN. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp 1571–1580 (2020). https://doi.org/10.1109/CVPRW50498.2020.00203
Zhao, Z. et al. Real-time detection of particleboard surface defects based on improved YOLOV5 target detection. Sci. Rep. 11, 21777. https://doi.org/10.1038/s41598-021-01084-x (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhang, H., Cisse, M., Dauphin, Y. & Lopez-Paz, D. Mixup: Beyond empirical risk minimization. (2017). https://doi.org/10.48550/arXiv.1710.09412
Yun, S., Han, D., Chun, S., Oh, S., Yoo, Y. & Choe, J. Cutmix: Regularization strategy to train strong classifiers with localizable features. Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 6022–6031 (2019). https://doi.org/10.1109/ICCV.2019.00612
Woo, S., Park, J., Lee, J. & Kweon, I. CBAM: Convolutional block attention module. Proceedings of the European Conference on Computer Vision, pp 3–19 (2018). https://doi.org/10.1007/978-3-030-01234-2_1
He, J. et al. Alpha-loU: A family of power intersection over union losses for bounding box regression. Adv. Neural Inform. Process. Syst. https://doi.org/10.48550/arXiv.2110.13675 (2021).
Article Google Scholar
Zheng, Z. et al. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Transact. Cybernet. 52(8), 8574–8586. https://doi.org/10.1109/TCYB.2021.3095305 (2022).
Article MathSciNet Google Scholar
Li, W., Liu, J. & Mei, H. Lightweight convolutional neural network for aircraft small target real-time detection in Airport videos in complex scenes. Sci. Rep. 12, 14474. https://doi.org/10.1038/s41598-022-18263-z (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Han, L. et al. A novel early warning strategy for right-turning blind zone based on vulnerable road users detection. Neural Comput. Appl. 34, 6187–6206. https://doi.org/10.1007/s00521-021-06800-2 (2022).
Article Google Scholar

Download references

Acknowledgements

This study was supported by the National Natural Science Foundation of China (Grant Number 51904007), the Collaborative Innovation Project of Universities in Anhui Province (Grant Number GXXT-2020-60), the Open Fund of State Key Laboratory of Mining Response and Disaster Prevention and Control in Deep Coal Mine (Grant Number SKLMRDPC20KF10), and the Graduate Innovation Fund of Anhui University of Science and Technology (Grant Number 2021CX2046).

Author information

Authors and Affiliations

State Key Laboratory of Mining Response and Disaster Prevention and Control in Deep Coal Mines, Anhui University of Science and Technology, Huainan, 232001, China
Tun Yang, Shuang Wang, Jiale Tong & Wenshan Wang
School of Mechanical Engineering, Anhui University of Science and Technology, Huainan, 232001, China
Tun Yang, Shuang Wang, Jiale Tong & Wenshan Wang
Collaborative Innovation Center for Mining Intelligent Technology and Equipment, Huainan, 232001, China
Shuang Wang

Authors

Tun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Shuang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiale Tong
View author publications
You can also search for this author in PubMed Google Scholar
Wenshan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.Y. and S.W. wrote the first draft of manuscript, J.T. and W.W. made datasets, analyzed data and revised the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Shuang Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, T., Wang, S., Tong, J. et al. Accurate real-time obstacle detection of coal mine driverless electric locomotive based on ODEL-YOLOv5s. Sci Rep 13, 17441 (2023). https://doi.org/10.1038/s41598-023-44746-8

Download citation

Received: 28 September 2022
Accepted: 11 October 2023
Published: 14 October 2023
DOI: https://doi.org/10.1038/s41598-023-44746-8
Springer Nature Limited

Accurate real-time obstacle detection of coal mine driverless electric locomotive based on ODEL-YOLOv5s

Abstract

Similar content being viewed by others

Track Obstacle Real-Time Detection of Underground Electric Locomotive Based on Improved YOLOX

Obstacle detection in single images with deep neural networks

DW-YOLO: An Efficient Object Detector for Drones and Self-driving Vehicles