Introduction

Lentinula Edodes logs are critical carriers for Lentinula Edodes cultivation and are frequently contaminated by sundry bacteria during the cultivation process [1,2,3], causing substantial economic losses to enterprises. Currently, the contamination status of Lentinula Edodes logs still relies on manual inspection. Manual inspection is not only high in labor cost and low in efficiency, but it also requires high professional quality of inspectors and usually can only detect contamination of Lentinula Edodes logs when it is more obvious. Timely, accurate detection of initial contamination in Lentinula Edodes logs is crucial for preventing further spread and generation of contamination, ensuring quality and yield improvements.

In recent years, with the development of deep learning theory and edge cloud computing, detection algorithms based on deep learning have been widely used due to their good generalization ability and cross-scenario capabilities [4,5,6,7], and related needs for mobile edge computing and cloud computing have gradually emerged [8,9,10,11,12,13]. Many academic institutions and industry researchers have invested in the field of edge cloud computing, and these studies have promoted the development of edge cloud computing and computer vision technology [14,15,16,17]. In this context, the use of deep learning technology to process crop disease images has gradually become a research hotspot [18,19,20]. Zu et al. [21] used a deep learning method to identify contamination of Lentinula Edodes logs for the first time and proposed an improved ResNeXt-50 (32 × 4d) model for Lentinula Edodes log contamination identification. The method improves the model by fine-tuning the six fully connected layers in the ResNeXt-50(32 × 4d) model to improve the accuracy of Lentinula Edodes logs contamination recognition, thereby breaking the situation of relying on manual detection with low efficiency and easy selection errors. However, this method has complex network structure and low detection efficiency, so it is not suitable for deployment in mobile devices or embedded devices, and cloud computing may provide it with more computing resources. To this end, Zu et al. [22] proposed Ghost-YoLoV4 for Lentinula Edodes logs contamination identification, which used a lightweight network, GhostNet, instead of a back-bone feature extraction network. This lightweight approach is well-suited for mobile edge computing, allowing real-time contamination identification on edge devices and alleviating the burden on cloud computing servers. Although scholars have achieved certain results in Lentinula Edodes logs contamination detection using deep learning techniques, the detection effect of existing studies is unsatisfactory in the early stage of contamination due to small contamination areas, requiring small object detection. The difficulty of small object detection has been an important problem faced by object detection algorithms, and many scholars have conducted in-depth studies for this purpose [23,24,25,26,27]. However, no relevant literature on deep learning for small object Lentinula Edodes logs contamination detection has been found in previous studies. This study combined cloud computing and edge computing to process shiitake mushroom stick data, and designed an edge cloud computing framework for image enhancement and real-time detection on edge devices, as shown in Fig. 1. Edge devices with good network establish wireless network or data transmission connections with cloud servers. The cloud server receives and processes edge devices requests and performs corresponding algorithm calculations. When receiving multiple requests from multiple mobile edge devices, the cloud server performs parallel computing in the cloud. After the calculation is completed, real-time responses are provided to the edge devices. The overall data training and testing framework maintains the accuracy of edge computing resources in small object Lentinula Edodes logs contamination detection.

Fig. 1
figure 1

The edge cloud computing framework for image enhancement and real-time detection on edge devices

The YOLOv7 model proposed by Wang et al. [28] has faster speed and higher accuracy on the COCO dataset, making it ideal for mobile edge computing and resource-constrained environments. This performance improvement is of great significance for applications deployed in mobile edge computing systems and providing services in cloud computing environments. In this study, we improved the YOLOv7 algorithm and proposed a model (SRW-YOLO) applicable to small object Lentinula Edodes logs contamination detection. First, SPD-Conv was introduced in the MP module of the network to highlight small object contamination object features, which helps to perform object detection more effectively on resource-constrained edge devices. Then, RepVGG was used to re-parameterize the ELAN structure in the backbone network, reducing the pressure on mobile edge inference computing and cloud server resources, and further improving the detection of small object Lentinula Edodes logs contamination. Finally, the object location was regressed using the WIoU loss function paying more attention to ordinary-quality anchor boxes to improve overall detection performance of bacteriophage contamination condition.

Related work

Cloud computing

As a paradigm of distributed computing, cloud computing decomposes large-scale data into sub-modules through a network center and then distributes it to a system composed of multiple servers for processing and analysis. The calculation results are finally fed back to the central node [29, 30]. Cloud computing technology combines the characteristics of distributed computing, parallel computing and grid computing to build massive computing clusters and storage clusters to provide users with scalable computing resources and storage space at low cost. Currently, many companies have enterprise-level cloud computing platforms, such as Amazon Cloud Computing, Alibaba Cloud Computing, Baidu Cloud Computing, etc. Compared with traditional application platforms, cloud computing platforms have the advantages of powerful computing power, unlimited storage capacity, and convenient and fast virtual services. However, for individuals and small companies, renting a cloud computing server involves additional costs. Therefore, in order to reduce cloud computing costs, a variety of new lightweight networks have been proposed for target detection. Common strategies include avoiding full connections in the network, reducing the number of channels and convolution kernel size, and optimizing down-sampling, weight pruning, weight discretization, model representation, and encoding [31, 32]. However, there is currently a lack of a small object Lentinula Edodes logs contamination detection model suitable for mobile edge computing and cloud computing environments.

Edge computing

The architecture design of edge computing originated from cloudlet [33] proposed by Carnegie Mellon University in 2009. In 2016, the Wayne State University team [34] formally defined edge computing and conducted in-depth research on its application scenarios. Subsequently, artificial intelligence solutions based on edge computing became a research hotspot. Hu et al. [35] proposed a method to build a face detection video surveillance system based on mobile edge computing (MEC). This method uses different detection algorithms at the edge and the cloud, and decides whether it needs to be sent to the cloud based on the confidence of edge detection. Jia [36] discussed the application prospects of edge computing models based on distributed data collection and processing in intelligent video detection. Wang et al. [37] proposed an online monitoring system architecture for transmission lines based on the ubiquitous Internet of Things by studying image recognition and mobile edge computing technology based on deep learning. However, it is difficult for existing edge detection models in agriculture to maintain a balance between accuracy and real-time performance.

Crop disease detection

In the field of agricultural production, ignoring the early signs of plant disease may lead to losses in food crops, which could eventually destroy the world’s economy. Anh et al. [38] introduced a multi-leaf classification model based on a benchmark dataset, utilizing a pre-trained MobileNet CNN model. Their approach demonstrated efficiency in classification, achieving a reliable accuracy of 96.58%. In another study [39], a multi-label CNN was proposed for the classification of various plant diseases, employing transfer learning approaches such as DenseNet, Inception, Xception, ResNet, VGG, and MobileNet. The authors claimed the novelty of their research as the first to classify 28 classes of plant diseases using a multi-label CNN. The Ensemble Classifier was employed for plant disease classification in [40], evaluated with two datasets—PlantVillage and Taiwan Tomato Leaves. Pradeep et al. [41] presented the EfficientNet model, a convolutional neural network designed for multi-label and multi-class classification. The inclusion of a secret layer network in the CNN positively impacted the identification of plant diseases. However, the model exhibited underperformance when validated with benchmark datasets. In [42], an effective, loss-fused, resilient convolutional neural network (CNN) was proposed using the benchmark dataset PlantVillage, achieving a notable classification accuracy of 98.93%. Despite enhancing classification accuracy, the model faced challenges in real-time image classification under varying environmental conditions.

Materials and methods

Data acquisition

The dataset used in this study was sourced from the meticulously built database of the Smart Village Laboratory at Shandong Agricultural University, which was specifically developed for this research. The data was collected from a factory culture shed located in Shandong Province, China. A unique aspect of the data collection process was the installation of LED strip lights at regular intervals within the Lentinula Edodes log culture shed. At the same time, using cloud computing technology, Lentinula Edodes logs cultivators can monitor the lighting conditions in the Lentinula Edodes logs cultivation shed in real time, and remotely control the brightness and position of the LED strip lights to ensure a normal Lentinula Edodes logs cultivation environment to the greatest extent. The acquisition equipment used in the study was composed of two devices: A Canon EOS 600D camera and an IQOO8 cell phone. The image resolution captured by these devices ranged from 1900 to 4000 pixels in width and from 2200 to 4000 pixels in height. Based on the collected images of Lentinula Edodes logs, the logs were categorized into three distinct groups: Normal Lentinula Edodes logs, Aspergillus flavus-contaminated Lentinula Edodes logs, and Trichoderma viride-contaminated Lentinula Edodes logs, as distinctly delineated in Fig. 2. The dataset also included images of Lentinula Edodes logs that were contaminated by small objects. A comprehensive total of 3156 images were amassed, which comprised of 1734 images of normal Lentinula Edodes logs, 700 images of Aspergillus flavus-contaminated Lentinula Edodes logs, and 722 images of Trichoderma viride-contaminated Lentinula Edodes logs.

Fig. 2
figure 2

Example images of Lentinula Edodes logs. a, b normal, c, d Aspergillus flavus-contaminated, and e, f Trichoderma viride-contaminated

Data pre-processing

In the realm of deep learning, it's essential for models to undergo rigorous training using copious amounts of data, a practice that is paramount to avert the issue of overfitting. The adequacy and comprehensiveness of the dataset employed assume a pivotal role in the endeavor to bolster the accuracy of the model being proposed. In a quest to widen the sample size, this study embraced the technique of data augmentation. The strategy of data enhancement incorporated a plethora of morphological operations such as rotation of angle, adjustment of saturation, alteration of exposure, flipping of images either up or down, and the application of random cropping, as distinctly delineated in Fig. 3. Through the implementation of these methodologies, an enlarged pool of samples could be generated, thereby enhancing the generalization capability and sturdiness of the model under consideration. The expanded set of data samples comprised 2988 images capturing normal Lentinula Edodes logs, 1912 images showcasing Aspergillus flavus-tainted Lentinula Edodes logs, and a further 1512 images depicting Trichoderma viride-contaminated Lentinula Edodes logs. The grand total of these images reached 6412, each of which was archived in the jpg format.

Fig. 3
figure 3

Renderings of data enhancements

Concurrently, labeling was deployed as an image annotation tool. The types of annotations were meticulously categorized into three distinct groups: Normal, Aspergillus flavus, and Trichoderma viride. The label files, in turn, were preserved in the yolo format. To conclude the dataset's preparation, it was systematically partitioned into three distinct subsets: a training set, a validation set, and a test set. The proportions of these sets were calculated at a ratio of 8:1:1. Specifically, the training set encompassed a total of 5130 images, while the validation and test sets each contained 641 images.

SRW-YOLO model construction

In this study, an SRW-YOLO network model suitable for mobile edge computing and cloud computing environments was designed for the problem of small object Lentinula Edodes log contamination detection, as shown in Fig. 4. Firstly, the MP module was improved by SPD-Conv to enhance the learning of small object features in the Lentinula Edodes log images and avoid the loss of fine-grained information. Secondly, RepVGG was introduced into the ELAN structure, and the structure was reparameterized to decouple the multi-branch structure and inference ordinary structure during model training, which further improved the efficiency and accuracy of inference for small object contaminated regions. Finally, the original boundary regression loss function was replaced with the WIoU loss function, which weakens the influence of high-quality anchor boxes and low-quality sample features and focuses on ordinary-quality anchor boxes, making the model output results more accurate. During the Lentinula Edodes logs cultivation phase, the mobile device collects images and transmits them to the cloud data processing center to generate the final mushroom stick detection image.

Fig. 4
figure 4

The network structure of SRW-YOLO

MP module based on SPD-Conv

YOLOv7 uses an MP structure to downsample the input. Downsampling is usually implemented using convolutional layers, pooling layers, or convolution with a step size greater than 1 to gradually reduce the spatial size of the input tensor and, thus, increase the perceptual field of the network. However, in the process of downsampling, it was easy to cause the resolution of Lentinula Edodes log images to decrease too fast, which would lead to a loss of information about the location and size of Lentinula Edodes log contamination, thus reducing the accuracy of detection. Therefore, to solve this problem, the MP module was improved by introducing SPD-Conv [43]. SPD-Conv consists of a space-to-depth (SPD) layer and a non-stride convolutional layer. The SPD layer slices an intermediate feature map \(X\left(S*S*{C}_{1}\right)\) into a series of sub-maps \({f}_{({\text{x}},{\text{y}})}\) by downsampling the feature maps inside the convolutional neural network and the entire network.

$$\begin{array}{c}{f}_{\mathrm{0,0}}=X\left[0:S:{\text{scale}},0:S:{\text{scale}}\right],{f}_{\mathrm{1,0}}=X\left[1:S:scale,0:S:scale\right],\dots ,\\ \begin{array}{c}{f}_{scale,0}=X\left[scale-1:S;scale,0:S;scale\right]\end{array}\end{array}$$
(1)
$$\begin{array}{c}{f}_{\mathrm{0,1}}=X\left[0:S:scale,1:S:scale\right],f\mathrm{1,1},\dots ,\\ \begin{array}{c}{f}_{scale-\mathrm{1,1}}=X\left[scale-1:S:scale,1:S:scale\right];\end{array}\end{array}$$
(2)
$$\begin{array}{c}\vdots \\ {f}_{0,scale-1}=X\left[0:S:scale,scale-1:S:scale\right],{f}_{1,sclae-1},\dots ,\\ \begin{array}{c}{f}_{sclae-1,sclae-1}=X\left[scale-1:S:scale,scale-1:S:scale\right]\end{array}\end{array}$$
(3)

Given any (original) feature map \(X\), \({f}_{x,y}\) which consists of the feature map \(X\left(i,j\right)\) is composed of the region where \(i+x\) and \(j+y\) are divisible by the scale.

Thus, each subsample is mapped down by a scale factor \(X\). Finally, the sub-feature maps are stitched along the channel dimension to obtain a feature map \(X\). Adding a non-stride convolution after the SPD feature transformation preserves all the discriminative feature information as much as possible, and the SPD-Conv structure is shown in Fig. 5

Fig. 5
figure 5

Illustration of SPD-Conv when scale = 2

A total of five MP modules were constructed in the original model for the backbone network and the feature fusion network. Since there is a convolution of step 2 in the second branch of the MP module, this study used SPD-Conv to replace the convolution of step 2 in the MP of the feature fusion network, as shown in Fig. 6. Considering the large input image pixels, the number of parameters, and the computational efficiency of the model, all convolutions with step size 2 in the network were not replaced in this study.

Fig. 6
figure 6

Improvement of MP module

RepVGG-based efficient aggregation network module

The efficient aggregation network module proposed in the original model is mainly divided into an ELAN [44] structure and an E-ELAN structure. The ELAN uses a special jump connection structure to control the longest gradient path, and the deeper network can learn and converge efficiently. The E-ELAN is an expansion, channel rearrangement, and transition layer architecture without destroying the original gradient path of the ELAN or changing the merging bases to enhance the learning ability of the network. However, the efficient aggregation network module may assign some important information to different groups and affect model performance. In addition, this module uses fewer convolutional layers, which can be challenging when dealing with the task of detecting small object contaminated areas of Lentinula Edodes logs. Therefore, in this study, the efficient aggregation network module was improved using RepVGG [45]. RepVGG decouples the training multi-branch topology and inference single-way structure using structural reparameterization, as shown in Fig. 7. The structural reparameterization is mainly divided into two steps: the first step is mainly to fuse Conv2d and BN (Batch Normalization) as well as to convert the branches with only BN into one Conv2d, and the second step fuses the 3 × 3 convolutional layers on each branch into one convolutional layer; this structure can increase the nonlinearities of the model while reducing the computation during inference. At the same time, the reparameterization reduces the computation and memory usage, which helps to handle small object contamination detection tasks. The specific improvement in this study is to introduce the RepVGG module in all ELAN structures in the backbone network.

Fig. 7
figure 7

Sketch of RepVGG architecture

Boundary regression loss function

In an object detection task, the bounding box regression loss function is critical to the performance of the model used. The role of the bounding box regression loss function is to measure the difference between the model-predicted bounding box and the true bounding box, which affects the detection effectiveness of the model. Low-quality samples, such as small object contamination, exist in the dataset of Lentinula Edodes logs, and the geometric factors, such as distance and aspect ratio, taken into account by the traditional bounding box loss function will aggravate the penalty of low-quality examples, which may reduce the generalization performance of the model. Therefore, in this study, WIoUv3 [46] was used as the boundary regression loss function for the model. WIoUv3 proposes outliers instead of IoU to evaluate the quality of anchor boxes and provide a sensible gradient gain allocation strategy. This strategy reduces the competitiveness of high-quality anchor boxes while minimizing harmful gradients generated by low-quality examples, which contributes to the speed of model convergence and the accuracy of inference, thus improving the overall performance of model detection. This is achieved by assigning outlier \(\beta\) an appropriate gradient gain depending on its size, with smaller or larger outliers \(\beta\) being assigned smaller gradient gains that are more focused on ordinary-quality anchor boxes, with outlier \(\beta\) being defined as follows:

$$\beta =\frac{{L}_{IoU}^{*}}{\overline{{L }_{IoU}}}\in \left[0,+\infty \right)$$
(4)

where \({L}_{IoU}^{*}\) is the monotonic focus factor and \(\overline{{L }_{IoU}}\) is the sliding average of the momentum of \({\text{m}}\).

Distance attention was also constructed based on the distance metric, and a WIoUv1 with two layers of attention mechanisms was constructed as follows:

$$\begin{array}{c}{L}_{W\cdot IoUv1}={R}_{WIoU}{L}_{IoU}\\ {R}_{WIoU}={\text{exp}}\left(\frac{{\left(x-{x}_{gt}\right)}^{2}+{\left(y-{y}_{gt}\right)}^{2}}{{\left({W}_{g}^{2}+{H}_{g}^{2}\right)}^{*}}\right)\end{array}$$
(5)

where \({L}_{IoU}\) is the degree of overlap between the prediction box and the real box; \((x,y)\) is the center coordinate of the predicted box; \(({x}_{gt},{y}_{gt})\) is the center coordinate of the real box; and \({W}_{g}\) and \({H}_{g}\) are the length and width of the real box and the predicted box, respectively.

At this point, applying the outlier degree to \({L}_{W IoUv1}\) obtains \({L}_{W IoUv3}\):

$${L}_{W\cdot IoUv3}=r{L}_{W\cdot IoUv1},r=\frac{\beta }{\delta {\alpha }^{\beta -\delta }}$$
(6)

where \({L}_{W IoUv1}\) is the attention-based boundary loss, and \(\delta\) with \(\alpha\) is the hyperparameter.

When the outlier degree of the anchor box satisfies \(\beta =C\) (\(C\) is a constant value), the anchor box will obtain the highest gradient gain. Since \(\overline{{L }_{IoU}}\) is dynamic, the quality classification criteria of the anchor boxes are also dynamic, which allows WIoUv3 to construct a gradient gain allocation strategy that best fits the current situation at each moment.

Model training and evaluation

Model training

In this study, SRW-YOLO used the default hyperparameters of YOLOv7. The learning rate was set to 0.01, SGD was selected for hyperparameter optimization, and the learning rate momentum was set to 0.937. Meanwhile, a pre-trained model was used for training assistance, which could help the model achieve better initial performance. The configuration of the experimental environment in this study is shown in Table 1.

Table 1 Experimental environment configuration

Model evaluation

To verify the performance of Lentinula Edodes log contamination detection, Precision, Recall, mAP, and FPS were used for evaluation in this study. The calculation equations are as follows.

$$Precision=\frac{TP}{TP+FP}$$
(7)
$$Recall=\frac{TP}{TP+FN}$$
(8)
$$AP={\int }_{0}^{1} {P}_{\left(r\right)}dr$$
(9)
$$mAP=\frac{\sum_{i=1}^{C} A{P}_{i}}{C}$$
(10)

where \(TP\) indicates that the object is a certain type of Lentinula Edodes logs and the network model detection also indicates a certain type of Lentinula Edodes logs. \(FP\) indicates that the object is not a type of Lentinula Edodes logs, but the network model detects a type of Lentinula Edodes logs. \(FN\) indicates that the object is a certain type of Lentinula Edodes logs, but the network model detection indicates it is not a certain type of Lentinula Edodes logs. \(AP\) is the area enclosed by Precision and Recall on the curve. \(mAP\) is the average of all categorized \(AP\) values; when IoU is set to 0.5, it is mAP@0.5, and mAP@0.5:0.9 means that the IoU threshold is between 0.5 and 0.9.

Results and analysis

Model visualization analysis

After the training of the model, the feature extraction results of the first convolutional layer, the backbone module, and the last convolutional layer were visualized and analyzed in this study using class activation mapping (CAM) [47]; the information of interest to the network model can be seen from the visualized feature map. This study randomly selects an image of small object Lentinula Edodes logs contamination from the training set to visualize its characteristics. The red box area is the area contaminated by the Lentinula Edodes log. The visual analysis results are shown in Fig. 8. The figure shows the feature visualization images of the three improvement strategies of SPD-Conv, RepVGG, and WIoUv3 regression loss function and the three stages of the SRW-YOLO comprehensive improvement model. The three stages are the first convolutional layer, the feature extraction backbone layer and the last convolutional layer. The darker the red part, the more the model pays attention to this part of the image; this is followed by the yellow part. The bluer the heat map is, the more the model considers this part as redundant information.

Fig. 8
figure 8

Visualization of the feature map

As can be seen from the first layer convolutional feature map, the three improvement strategies mainly focus on the low-level features of the Lentinula Edodes logs, such as edges and textures. The feature map of the feature extraction backbone convolu-tional layer shows more advanced feature attention, and the focus is more localized. SRW-YOLO accurately locates small object contaminated areas, and the three im-provement strategies all focus on the contaminated areas of the bacterial sticks rela-tively accurately. However, the three improvement strategies all focus on more back-ground redundant information to varying degrees. It can be seen from the feature map of the last convolutional layer that the features extracted by different im-provement strategies are more abstract and refined, revealing how the model focuses on discriminative features in the final stage. The above improvement strategies ulti-mately focused on two contaminated areas. However, SPD-Conv paid too much attention to the two areas and considered more redundant pixels; Rep-Conv and WIoU3 also paid too much attention to the right areas. The feature extraction ability of the contaminated area below is weak; while SRW-YOLO focuses on key pixel areas and is more accurate. It can be observed from the feature maps from the backbone module to the last layer that the algorithm model proposed in this study plays a good role in reinforcing the feature maps, suppressing unnecessary features, and enabling better extraction of small object contamination feature information from the images.

Analysis of experimental results

To verify the positive impact of the improvement strategy proposed in this study on the network, ablation experiments were conducted on the Lentinula Edodes log da-taset in this paper. Five sets of experiments were conducted, and different improve-ment modules were added for comparison with YOLOv7, with Precision, Recall, mAP@0.5, and FPS being used as the measures. The results of the ablation experi-ments are shown in Table 2.

Table 2 Results of ablation experiments

As can be seen from the above table, Experiment 1 provides the detection results of the original YOLOv7 network. In Experiment 2, Precision, Recall, and mAP@0.5 improve by 2.33%, 1.93% and 1.97%, respectively. This indicates that during the model downsampling process, SPD-Conv effectively alleviates the impact of the rapid decrease in resolution of the mushroom stick image, thereby strengthening the learning of effective feature representation of the Lentinula Edodes logs image and avoiding the loss of fine-grained information, and helping improve the accuracy of mobile edge device detection. In Experiment 3, Precision, Recall and mAP@0.5 improve by 1.77%, 0.51% and 1.56%, respectively. This indicates that after using structural re-parameterization to improve the efficient aggregation network module in the model, RepVGG can reduce the computational load and memory usage of model inference while improving the efficiency and accuracy of inference on Lentinula Edodes logs contaminated areas, and reducing the pressure of mobile edge computing inference. and the burden on cloud servers. In Experiment 4, YOLOv7 improves Precision by 1.56%, Recall by 1.05% and mAP@0.5 by 1.58% over the YOLOv7 algorithm after using WIoUv3 as the boundary regression loss function of the network. This indicates that when YOLOv7 adopts WIoUv3, through a wise gradient gain allocation strategy, the model is more focused on ordinary-quality Lentinula Edodes logs detection anchor boxes, making the model output results more accurate. In Experiment 5, the Precision improves by 3.15% and mAP@0.5 improves by 2.64% over the YOLOv7 algorithm. This shows that when the SPD-Conv module and RepVGG module are introduced into the original network, the network inference efficiency is improved while avoiding the loss of location and size information of bacteriophage contamination, which in turn improves the accuracy of detection. Experiment 6 integrated the above improved methods, and it can be clearly seen that the detection effect is the best. Precision reaches 97.63%, which is 4.62% better than YOLOv7; Recall reaches 96.43%, 3.63% higher than YOLOv7; and mAP@0.5 reaches 98.62%, which is 2.31% better than YOLOv7. At the same time, it also maintains good real-time detection, which can meet the requirements of small object Lentinula Edodes logs contamination detection in mobile edge computing and cloud computing environments.

The ablation experiments can only verify the effectiveness of the improved strategy in this study relative to the original algorithm, but whether it can reach the leading level in different models still needs further proof. Therefore, under the same experimental conditions, a series of comparative experiments were conducted in this study to compare the performance of the improved method with the current mainstream one-stage object detection method using the Lentinula Edodes log dataset.

A comparison of the training results of different models is shown in Fig. 9. From the figure, it can be seen that the value of mAP@0.5 of the improved algorithm in this study is significantly higher than the other three models.

Fig. 9
figure 9

Comparison of training box_loss curves of different models

Figure 10 presents a comparison of the regression loss curves for different models with training time. After 40 iterations, the loss curves of different models gradually and steadily converge. It can be seen that YOLOv6m has poor loss convergence in this dataset and YOLOv5l has an overfitting problem after 100 training iterations. YOLOv5l and YOLOv6m are much less effective than YOLOv7 in terms of regression loss. The model proposed in this study shows a better drop rate and convergence ability than YOLOv7, thus proving that the improvement of the boundary regression loss function improves the convergence ability of the network.

Fig. 10
figure 10

Comparison of training box_loss curves of different models

Table 3 lists the comparison results of the evaluation metrics of different models. ResNeXt-50 (32 × 4d), MobilenetV3-YOLOv4 and Ghost-YOLOv4 are Zu’s research methods. Compared with the mainstream YOLO series algorithms, the performance of these methods in small object Lentinula Edodes logs contamination detection needs to be improved. Compared to other models, although the detection speed of the SRW-YOLO model proposed in this study is not the highest, it is much better than other models in the evaluation metrics of mAP@0.5, Recall, and mAP@0.5:0.9, This allows the model to maintain a good balance between detection accuracy and real-time performance.

Table 3 Comparison of evaluation indicators of different models

At the same time, in order to further demonstrate the superiority of the SRW-YOLO model improvement strategy, Table 4 lists the comparison results of the evaluation indicators of YOLOv7 and SRW-YOLO in three classes of Lentinula Edodes logs contamination detection. Compared with YOLOv7, SRW-YOLO has improved to varying degrees in the evaluation indicators of Precision, Recall and mAP@0.5. Among them, the original model has the worst effect in detecting Aspergillus flavus contaminated Lentinula Edodes logs, but the SRW-YOLO model improves Precision, Recall and mAP@0.5 by 8.12%, 5.33% and 2.36% respectively compared with YOLOv7. This shows that the SRW-YOLO model proposed in this article has more advantages in actual detection and can accurately detect different classes of Lentinula Edodes logs.

Table 4 Comparison of evaluation indicators of different classes

For a more intuitive understanding of the performance of the models, Fig. 11. Shows the detection results of the four models for a randomly selected image in the test set, with the red box in the figure showing the area contaminated by Trichoderma viride. Although all four models are able to detect the type of Lentinula Edodes logs, YOLOv5l, YOLOv6m, and YOLOv7 have lower confidence in the detection of the object and poorer detection results. In contrast, SRW-YOLO has obvious superiority with 95% object confidence and accurately detects small object contaminated areas.

Fig. 11
figure 11

Detection results of different models

In summary, the Lentinula Edodes log contamination detection model proposed in this study has strong generalization ability and robustness. During the Lentinula Edodes logs cultivation stage, this model can better locate areas with small contamination objects in Lentinula Edodes logs and accurately detect the type of Lentinula Edodes log contamination.

Conclusion

In this study a model for small object Lentinula Edodes logs contamination detection (SRW-YOLO) suitable for mobile edge computing and cloud computing environments was constructed based on YOLOv7. SPD-Conv was introduced in the MP module of the feature fusion network to improve the learning ability of the model for small object contamination location and semantic information of Lentinula Edodes logs, which helps to enhance the accuracy of mobile device detection with limited resources; the ELAN structure in the backbone network was reparameterized and the RepVGG architecture was used to realize the decoupling of training and inference to efficiently and accurately detect the types of Lentinula Edodes logs and reduce mobile edge computing inference pressure and cloud server burden; the WIoU loss function was set as the boundary regression loss of the network function to reduce the competitiveness of high-quality anchor boxes while minimizing harmful gradients generated by low-quality samples to improve the overall performance of Lentinula Edodes logs contamination condition detection. Compared to the current mainstream one-stage object detection model, the experimental results showed that the detection of small object Lentinula Edodes log contamination by SRW-YOLO is significantly better. In summary, SRW-YOLO provides an efficient, accurate and practical small object contamination detection method that can be deployed to Android mobile devices or embedded devices. In addition, companies or individuals using the network proposed in this study can reduce the performance of cloud computing servers and reduce the cost of renting cloud computing servers.

However, there are still some areas for improvement. The current Lentinula Edodes logs dataset has a relatively simple background, and the model may not perform well when the background is more complex or the data collection environment is dimmer. Therefore, in subsequent work, the dataset will be further improved and the proposed Lentinula Edodes logs contamination detection method will be optimized.