1 Introduction

As the Internet of Things (IoT) technology and the information industry undergo rapid development, QR codes, serving as a mode of information storage and sharing, have found extensive application in various domains such as logistics management, mobile payments, product traceability, healthcare services, and intelligent transportation [1]. In contrast to conventional one-dimensional barcodes, QR codes exhibit a range of advantages including larger data storage capacity, diverse encoding of data types, swifter detection speeds, and a wider array of application scenarios. As illustrated in Fig. 1, the QR code comprises pivotal modules such as position detection, format, and data information [2]. However, in intelligent logistics and transportation systems, challenges arise when smart mobile robots engage in recognition and navigation tasks, as the QR code systems they carry can be influenced by factors such as motion velocity, lens tremors, or device vibrations. This can result in blurry QR code images, consequently leading to a decrease in recognition accuracy.

Fig. 1
figure 1

Structure of a QR Code

When obtaining QR code images affected by motion blur, using deblurring algorithms involves a series of processing steps to enhance the signal-to-noise ratio, thereby attempting to restore clarity to the blurred image as much as possible. Researchers have conducted in-depth investigations into the issue of deblurring QR codes, proposing various motion deblurring algorithms tailored for QR codes or binary images. Van et al. [3] introduced a regularization approach to address the blurring issue in QR code images. Pan et al. [4] proposed a deblurring algorithm based on the dark channel prior. As well as a method entirely based on the Kullback-Leibler divergence [5]. They can perform simultaneous motion deblurring and denoising of QR codes. However, these traditional methods take a long time and have poor recovery effect on QR codes that are severely affected by motion blur, which affects the efficiency of QR code recognition. In recent years, methods based on deep learning have successfully been applied to QR code deblurring. The study of Tiwari et al. [6] combined the Ridgelet Transform (RT) and Radial Basis Function neural network (RBF) to deal with the barcode deblurring problem,but the algorithm is complex and time-consuming. Similarly, Pu et al. [7] employed a dual convolutional neural network for deblurring; however, manual specification of the blur kernel is required for training, which deviates from the complexity of real-world motion blur scenarios. The method proposed by Li et al. [8] employed a feature extraction architecture based on an encoder–decoder framework, achieving the deblurring of QR codes directly in an end-to-end manner. These end-to-end deep learning methods excel in handling motion deblurring tasks in dynamic scenes and have obvious advantages over traditional methods. However, some deep learning-based QR code deblurring methods still have some problems, such as the artifacts may appear in the restored image, the edge outline is not clear enough, the network model is too complex, and the training process is relatively difficult. In addition, so far, there is a lack of fuzzy QR code datasets in real dynamic scenarios specifically designed for training neural networks, which is one of the problems that need to be solved in this field.

In response to these problems, this paper proposes an algorithm for the recognition of motion-blurred QR codes based on generative adversarial networks and attention mechanisms. The primary goal of this algorithm is to achieve a clearer image restoration through end-to-end processing, obtain superior visual results, accentuate the distinctiveness of edge features, and simultaneously minimize the loss of fine details. Building upon this foundation, this study makes significant contributions in the following four aspects:

  1. (1)

    A multi-scale generative adversarial deblurring network is proposed, with improved residual blocks and multi-scale feature extraction modules structure as the core. Feature extraction is carried out at multiple scales to capture the detailed information of the image more comprehensively. Through this end-to-end approach, the rapid and efficient restoration of QR code images is achieved.

  2. (2)

    The improved efficient channel attention mechanism (IECA) module is designed to adaptively learn useful information of feature maps, and assign higher weight to areas with higher fuzziness and more important channels, so as to accurately aggregate and transmit feature information.

  3. (3)

    For the first time, the WGAN-div (Wasserstein divergence) loss function is introduced in the field of image deblurring, and W-div is easier to handle as the objective function of GAN, which can provide better gradient signals, easier convergence during training, and generate higher quality samples.

  4. (4)

    A blurry QR code dataset in a dynamic scenario is constructed and applied to optimize the training and evaluation processes of the model. Through this strategy, our algorithm shows stronger performance in handling blurry QR code deblurring, thereby further improving the recognition accuracy of fuzzy QR code.

The remainder of this paper is organized as follows: Sect. 2 provides a review of the relevant research related to this study. Section 3 presents the overall network architecture and loss functions of the proposed algorithm. Section 4 describes the dataset, experimental process and results in detail to verify the effectiveness of our algorithm. Lastly, Sect. 5 concludes the paper.

2 Related Work

2.1 Image Deblurring

Image deblurring is a challenging task within computer vision, aiming to recover a clear image from a blurred input image [9]. Typically, the process of image blurring can be mathematically modeled as follows:

$$\begin{aligned} B=k\otimes I+n \end{aligned}$$
(1)

In the equation, B represents the motion-blurred image, I represents the clear image, n represents the vector form of zero-mean Gaussian noise, k represents the convolutional blur kernel, and \(\otimes\) represents the two-dimensional spatial convolution operator.

The general mathematical expression for image restoration is given by:

$$\begin{aligned} I=k^{-1}\otimes B \end{aligned}$$
(2)

In the equation, \(k^{-1}\) represents the corresponding deconvolution blur kernel.

Research on image deblurring methods can be divided into two categories: one is non-blind image deblurring methods that are designed for known blur kernels [10,11,12], and the other is blind image deblurring methods that target unknown blur kernels [13,14,15,16]. Traditional image restoration methods assume that the degradation process of the image is known, which means the blur kernel k is determined. However, in real life, blur kernels are unknown in common, which requires traditional deblurring methods to estimate potential blur kernels and model different types of blurs (e.g., uniform and non-uniform) [17]. Furthermore, real-world blurring cases are often much more complex than modeling situations, and models that predict blurs require a large number of parameter adjustments and high time costs [18].

In recent years, with the rapid development of deep learning, convolutional neural networks (CNN) [19] have found widespread application in the field of image deblurring. These end-to-end methods [20,21,22,23] have demonstrated superior results in most scenarios compared to traditional methods. For instance, Sun et al. [24] employed CNN to predict patch-level motion blur probability distributions, which were combed with Markov random field models to infer dense non-uniform motion blur fields. Although the complex non-uniform motion blur problem that traditional methods struggle with was addressed, it came at the cost of increased computational time. Nah et al. [25] adopted a coarse-to-fine strategy, utilizing multi-scale CNNs to progressively restore clear images. Kupyn et al. [26] introduced a GAN-based image deblurring method, treating image deblurring as a specialized image-to-image translation task, which yielded significant improvements in texture details. Zhang et al. [27] proposed a spatial-transformer-based recurrent neural network to tackle deblurring in dynamic scenes, embedding the recurrent neural network within an encoder–decoder architecture. Kupyn et al. [28] built upon deblurGAN, introduced feature pyramid networks (FPN) [29], and leveraged a relativistic discriminator to further enhance image deblurring performances. Mao et al. [30] designed a deblurring network based on an autoencoder (AE), integrating residual fast Fourier transform and convolutional blocks to perform frequency-domain deblurring of blurred images. However, while these end-to-end image deblurring methods perform well on general datasets, they are not universally applicable to all variations of blur types. Especially when dealing with the task of blurring QR codes that focuses on restoring texture details, these methods are not as effective as well. Furthermore, in the field of QR code recognition, there are many excellent algorithms, such as [31,32,33,34]. However, these algorithms often perform well only when dealing with clear QR code images, and their recognition effectiveness tends to be unsatisfactory when faced with blurry QR code images. Moreover, so far, there are few deep learning-based algorithms specifically designed for blurry QR code recognition.

2.2 Generative Adversarial Networks

Generative Adversarial Network (GAN) is a network architecture proposed by Goodfellow et al. in 2014 [35]. It has achieved remarkable success in various domains such as image synthesis, semantic image editing, style transfer, and image super-resolution [36]. The GAN model, as shown in Fig. 2, consists of two components: a generator and a discriminator. Its objective is to generate high-quality samples through an adversarial mechanism. The generator takes a random noise vector as input and transforms it into synthetic samples that resemble real samples. The discriminator, on the other hand, takes both real and generated samples and scores them to reflect the probability that the input sample is a real sample. This adversarial training process compels the generator to continually improve in order to better deceive the discriminator, resulting in the generation of more realistic samples.

Fig. 2
figure 2

GAN Model

Formally, the task of the discriminator D is to estimate \(P(y \mid x)\), the probability of label y given a sample x, while the generator G generates samples given a latent space, denoted below as G(z). In this process, G and D are performed simultaneously. If better output is generated, it will become more difficult for D to distinguish them. On the other hand, if D is more accurate, it is more difficult for G to cheat D. Therefore, D tries to maximize accuracy, while G tries to minimize it. During the training process, the competition and game between the generator and the discriminator can be formalized as the optimization process of the minimax objective function in Eq. 3:

$$\begin{aligned} \min _G \max _D E_{x \sim P_r}[\log (D(x))]+\underset{z \sim P_z}{E}[\log (1-D(G(z)))] \end{aligned}$$
(3)

In the Equation: x is the real data, z is the random vector in the latent space, G(z) is the generated sample, \(P_r\) is the distribution of the real data, \(P_z\) represents the distribution of the latent space of the discriminator, D(x) represents the discriminator for The probability of real samples, D(G(z)) represents the probability of the discriminator generating samples, and E represents the weighted average under the probability distribution.

Through repeated iterative training processes, the generator and discriminator gradually improve their respective performances. The generator strives to generate more realistic, high-quality samples, while the discriminator strives to improve accuracy to more accurately distinguish between generated and real samples. However, there is a problem with the traditional adversarial optimization loss function based on JS divergence (Jensen Shannon divergence), that is, when the two distributions do not intersect, it can easily lead to gradient disappearance and mode collapse. To solve this problem, Arjovsky [37] introduced the Wasserstein-1 distance (W-met) in the GAN framework, as shown in Eq. 4:

$$\begin{aligned} w\left( p_r, p_g\right) =\inf _{\gamma \in \Pi \left( p_r, p_g\right) } E_{(x, y) \sim \gamma }[\Vert x-y\Vert ] \end{aligned}$$
(4)

In the Equation, x and y represent real samples and generated samples, respectively, \(P_r\) and \(P_g\) represent the data distribution of x and y, respectively. \(\Pi \left( p_r, p_g\right)\) is the set of all possible joint distributions combined by \(P_r\) and \(P_g\). For each possible joint distribution In terms of \(\gamma\), the distance \(\Vert x-y\Vert\) of the corresponding sample can be obtained, and the expected value E of the distance for each sample can be calculated.

The W-met provides a continuous and differentiable metric based on the minimum cost of moving mass between two distributions, thereby overcoming the instability problem caused by the Jensen Shannon divergence used in traditional GANs. The introduction of W-met omits the logarithmic operation in the loss function and removes the Sigmoid function in the discriminator network. In addition, W-met utilizes the dual form under the k-Lipschitz constraint, and its value function formula is as follows:

$$\begin{aligned} \min _G\max _{\left\| D\right\| _L\le 1}\underset{x\sim P_r}{{\text {*}}{E}}[D(x)]-\underset{{\tilde{x}}\sim P_g}{{\text {*}}{E}}[D({\tilde{x}})] \end{aligned}$$
(5)

Where x is the real data, \(P_r\) is the distribution of the real data, \({\tilde{x}}\) represents the generated data, \(P_g\) represents the distribution of the generated data, and D represents the probability of the output of the discriminator. Here, \(\Vert D\Vert _L \le 1\) makes the discriminator D satisfy the 1-Lipschitz constraint, forcing the weight parameters of the discriminator to be cut to the interval [− 1,1]. However, such weight pruning may limit the search space of the function and lead to suboptimal solutions. In order to overcome the problems caused by weight clipping, Gulrajani [38] introduced the gradient penalty term (Gradient Penalty) in the Wasserstein generative adversarial network as an alternative to the Lipschitz constraint. The objective function is defined as follows:

$$\begin{aligned} L=\underbrace{\underset{x \sim \textrm{P}_r}{\textrm{E}}[D(x)]-\underset{{\tilde{x}} \sim \textrm{P}_g}{\textrm{E}}[D({\tilde{x}})]}_{\text{ Wasserstein } \text{ term } }+k \underbrace{\underset{{\tilde{x}} \sim \textrm{P}_y}{\textrm{E}}\left[ \left( \Vert \nabla D({\hat{x}})\Vert _2-1\right) ^2\right] }_{\text{ gradient } \text{ penalty } } \end{aligned}$$
(6)

where \(\nabla\) is the operator, x is the real data, \({\tilde{x}}\) represents the generated data, \({\hat{x}}\) is the linear interpolation between the real and generated samples, k is the weight of the gradient penalty term, and \(P_y\) is the distribution obtained by uniformly sampling at points between the real data distribution \(P_r\) and the generated data distribution \(P_g\), which ensures that the gradient penalty covers both distributions. However, when the data is sparse, it is difficult to satisfy the k-Lipschitz constraint for the whole data domain.

The use of generative adversarial networks is a very promising idea for motion deblurring [26,27,28]. In this paper, the principle and training method of generative adversarial network are used to design a GAN model specially adapted to the image deblurring task. The generator networks will receive a blurry QR code image as input and output a clear QR code image. Through continuous training iterations, it is expected that the generator can generate more realistic and high-quality deblurred QR code images to provide better visual perception and QR code recognition quality.

3 The Proposed Method

Within specific dynamic scenarios, QR codes can encounter pronounced effects of motion blur. In this section, a comprehensive exposition of our proposed algorithm for motion-blurred QR code recognition is presented. The overarching architecture of this algorithm chiefly comprises two key components: (a) the deblurring network architecture; and (b) the loss function.

3.1 Network Architecture

Figure 3 illustrates the overall network architecture based on conditional generative adversarial networks (CGAN) proposed in the paper. Our network model is designed to solve the blurred QR code image recovery problem, especially in complex motion blur situations. First, we propose a feature-level multi-scale feature extraction network to improve the richness of features by adopting convolution kernels of different sizes and the "Split-Transform-Merge" (STM) strategy to explore deblurring at different processing levels. clue. In the feature extraction network, we introduce the idea of residual to make the network easier to optimize and accelerate the recovery of motion blurred QR code images. In order to ensure the improvement of network performance, we also introduced a self-designed dual-branch efficient attention mechanism module (IECA) into the feature extraction network, which can adaptively integrate local feature information and global context dependencies to effectively remove image Blur to better restore image texture details. In addition, we design a multi-scale dilated convolution module, aiming to expand the receptive field of the network to better handle complex motion blur situations. The discriminator network selected the PatchGAN discriminator, which operates on randomly cropped image patches of size 70 \(\times\) 70 to better capture local information and provide fine-grained feedback. Through this end-to-end image restoration network structure, we are able to efficiently handle complex motion blur situations.

Fig. 3
figure 3

Overall Network Framework

3.1.1 Generator

Multi-scale representation plays an important role in the realm of deep learning. It bestows upon models the capacity to fathom contextual information with greater acumen, increases feature richness, and has important advantages in many visual tasks. Res2Net achieves notable results in numerous computer vision tasks by constructing hierarchical residual connections within a single residual block to present multi-scale features and expanding the receptive field range for each network layer. Meanwhile, InceptionV3, by introducing an assortment of convolutional kernels and pooling operations of varying dimensions, engenders a proliferation of branches along the network’s breadth. This orchestration harnesses diverse receptive fields to extract image features of disparate scales. The amalgamation of these extracted features subsequently transpires, ushering in an elevation of the network’s expressive prowess and feature extraction capabilities. Furthermore, InceptionV3 also incorporates bottleneck architectures and decomposed convolutions, strategically curbing network parameter volume and computational intricacy. Drawing inspiration from the concepts of Res2Net and InceptionV3, we have devised an enhanced residual module and a multi-scale feature extraction module, seamlessly integrated into the generator network. The generator network comprises two 3x3 convolutional blocks with a stride of 1, two multi-scale feature extraction modules, two 3x3 convolutional blocks with a stride of 2, nine improved residual modules, and two transpose convolutional blocks. Each individual convolutional unit incorporates an instance normalization (IN) layer and a LeakyReLU activation function layer, effectively confining coefficient ranges to stabilize generator training. Additionally, global skip connections have been introduced to enable the network to learn a mapping from blurry images to residual corrections, thereby achieving faster training speeds and improved generalization performance.

3.1.2 Multi-scale Feature Extraction Block

In this paper, multiple parallel convoluted integral branches and global average pooling are introduced to capture features at different scales. This multi-branch structure contributes to extracting rich local and global features, enhancing the effectiveness of image deblurring. Increasing the receptive field on the multiscale branch can provide a wider range of contextual information to help understand the structure and details of the image. To achieve this without increasing computational complexity, this paper introduces dilated convolutions in the multi-scale branches, as shown in Fig. 4b. In dilated convolutions, a fixed number of zero values are inserted between each element of the convolution kernel, effectively expanding its receptive field through the insertion of these additional zeros. By increasing the dilation rate, the range of the receptive field can be further extended. For different dilation rates, the size of the receptive field varies. Equation 7 presents the calculation formula for the receptive field.

$$\begin{aligned} N=(K-1) \times \textrm{d}+1 \end{aligned}$$
(7)

where N is the size of the receptive field, K is the size of the convolution kernel, and d is the dilation rate. The present paper devises a multi-scale dilated convolution module (MSDM), as illustrated in Fig. 4a. This block is composed of three parallel convolutional layers with dilation rates of 1, 2, and 4, correspondingly yielding receptive field dimensions of 3, 5, and 9. Through this intricate construction, not only is the receptive field expansively enlarged, but also the characteristic information of different scales is captured.

Fig. 4
figure 4

Multi-scale feature extraction module

3.1.3 IResB and IECA modules

In this paper, the res2net module is used to reduce the computational burden and ghosting profile caused by redundant features while increasing the receptive field. In order to further improve the network performance, void convolution is introduced in the residual block structure to expand the receptive field. In addition, after the residual block, an improved efficient channel attention mechanism is introduced, as shown in Fig. 5.

Fig. 5
figure 5

Improved Residual Block (IResB)

The ECA module [39] employs a local cross-channel interaction strategy without the need for dimension reduction, efficiently capturing interdependencies among channels. It can adaptively adjust features channel by channel, thereby enhancing meaningful feature information and suppressing irrelevant feature information. Introducing the ECA module into the deblurring network can further enhance the restoration of image details and structures. The ECA module performs global average pooling (GAP) on each channel to obtain a vector of length C, in which each element represents the average value on the corresponding channel, and then uses a one-dimensional convolution operation to perform the vector obtained in the previous step. After processing, a weight vector of length C is generated, and finally each channel in the original feature map is weighted according to the corresponding weight to form the final aggregated feature \(y \in R C\).It then employs the following Eq. 8 to learn channel attention:

$$\begin{aligned} \omega =\sigma (W y) \end{aligned}$$
(8)

where \(\sigma\) denotes the sigmoid function and W is a parameter matrix of size \(C\times C\), containing \(k\times C\) parameters, as shown in Eq. 9:

$$\begin{aligned} \left[ \begin{array}{cccccccc}\omega ^{1,1} &{} \cdots &{} \omega ^{1, k} &{} 0 &{} 0 &{} \cdots &{} \cdots &{} 0 \\ 0 &{} \omega ^{2,2} &{} \cdots &{} \omega ^{2, k+1} &{} 0 &{} \cdots &{} \cdots &{} 0 \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \ddots &{} \vdots &{} \vdots &{} \vdots \\ 0 &{} \cdots &{} 0 &{} 0 &{} \cdots &{} \omega ^{C, C-k+1} &{} \cdots &{} \omega ^{C, C}\end{array}\right] \end{aligned}$$
(9)

For Eq. 9, the weight of \(y_i\) is calculated solely by considering interactions among its k neighbors, as follows:

$$\begin{aligned} \omega _i=\sigma \left( \sum _{j=1}^k \omega ^j y_i^j\right) , y_i^j \in \Omega _i^k \end{aligned}$$
(10)

where \(\Omega _i^k\) represents the set of k neighboring channels of \(y^i\), and a more efficient approach is to have all channels share the same set of learning parameters. In experiments, it is found that maximum pooling is a nonlinear operation, while average pooling is a linear operation. Using the maximum pooling layer introduces nonlinear factors that help extract the best underlying representation of the image. In the experiment, it is also found that combining global average pooling and global maximum pooling can emphasize important features while retaining the global context, which is more conducive to the selection of features. In this study, the input is divided into two channels, each undergoing global average pooling and global maximum pooling operations separately. Subsequently, the obtained results from both operations are processed in parallel through a 1D convolution before being fed into a sigmoid layer. The final choice is the improved efficient channel attention (IECA) module, with its structure depicted in Fig. 6.

Fig. 6
figure 6

Improved Efficient Channel Attention (IECA) Module

3.1.4 Discriminator

The architecture of the discriminative network D employs a fully convolutional network structure, encompassing instance normalization and the LeakyReLU activation function. In this context, the negative slope of the LeakyReLU is set to \(\alpha =0.2\). The structural parameters of the discriminator network are elaborated in Table 1.

Table 1 Discriminator network architecture parameters

3.2 Loss Function

3.2.1 Adversarial Loss

Wu et al. [40] introduced a novel and relaxed version of the Wasserstein-1 distance, termed Wasserstein divergence (W-div). Unlike the requirement of the K-Lipschitz constraint [41], W-div incorporates distribution divergence. By combining this divergence with the Wasserstein distance, WGAN-div comprehensively measures the dissimilarity between the generator’s output distribution and the distribution of real samples. This holistic evaluation significantly enhances the visual quality of generated images and augments network stability. Moreover, WGAN-div outperforms WGAN-GP [40] in image generation tasks. Consequently, throughout the entire training process, the WGAN-div loss is employed as the adversarial loss. Its definition is as follows:

$$\begin{aligned} \min _G \max _D V(D, G)=\min _G \max _D \underset{G(z) \sim P_g}{E}[D(G(z))]-\underset{x \sim P_r}{E}[D(x)]-k \underset{{\hat{x}} \sim P_u}{E}\left[ \left\| \nabla _{{\hat{x}}} f({\hat{x}})\right\| ^p\right] \end{aligned}$$
(11)

Here, z represents the blurry image, x signifies the clear image, and \({\hat{x}}\) is obtained through a linear combination of real data points and synthesized data points during sampling.\(P_u\) denotes a Radon probability measure, while k regulates the impact of the gradient term on the objective function. Additionally, p corresponds to the \(L^p\) space of function f. In the experimental context, k is set to 2, and p is assigned the value of 6.

3.2.2 Content Loss

Using traditional pixel-level loss functions, such as L1 or L2 loss, often leads to the emergence of artifacts or blurriness in generated images [42]. This is because when these loss functions encourage the pixel values of generated images to closely match those of real images, it may lead to the loss of high-frequency details in the generated images. In contrast, perceptual loss [43] mitigates this issue by evaluating higher-level feature representations. To capture more intricate semantic information from images, enhancing the clarity and authenticity of generated images, perceptual loss is adopted. It is calculated by Eq. 12:

$$\begin{aligned} L_p=\frac{1}{W_{i, j} H_{i, j}} \sum _{x=1}^{W_{i, j}} \sum _{y=1}^{H_{i, j}}\left( \phi _{i, j}\left( I^S\right) _{x, y}-\phi _{i, j}\left( G\left( I^B\right) \right) _{x, y}\right) ^2 \end{aligned}$$
(12)

Here, \(I^B\), \(I^S\) and \(\varvec{G}\bigl (\varvec{I}^B\bigr )\) respectively denote the blurry image, the clear image, and the deblurred image. \(\phi _{i,j}\) represents the feature map obtained from the j-th convolution (after activation) before the i-th maxpooling layer within the VGG19 network. \(W_{i,j}\) and \(H_{i,j}\) stand for the dimensions of the feature map. In this study, the features from the \(\phi _{3,3}\) convolutional layer of VGG-19 are utilized to compute the mean squared error of the deblurred image. Through empirical validation, it has been demonstrated that this choice ensures the attainment of high-quality deblurring outcomes.

Finally, we construct the loss function as a weighted sum of the adversarial loss and the content loss, with an equation of 13.

$$\begin{aligned} L=L_{a d v}+\lambda L_p \end{aligned}$$
(13)

Where the value of \(\lambda\) is 100.

4 Experiments and Results

4.1 Dataset

The GOPRO dataset [25] stands as a publicly available resource employed in the realm of image deblurring research. This dataset emulates motion blur effects by averaging consecutive short-exposure frames, thus fabricating more authentic blurry images. To elaborate, the process entails the averaging of sequential frames to generate a blurred image, with the average frame of the sequential frames serving as the clear image reference. In this study, the method proposed in [25] is used to create the dataset. QR code videos are captured at a frame rate of 240 frames per second, with an average of 15–21 frames per second, as well as at a frame rate of 120 frames per second, with an average of 9–15 frames per second, to construct the motion-blurred QR code dataset. The final dataset comprises two integral components: firstly, a selection of 3214 pairs of blurry and clear images derived from the GOPRO dataset; secondly, a collection of 2424 pairs of blurred QR codes and clear QR codes captured independently. All images within the dataset maintain a resolution of \(1280\times 720\). Furthermore, beyond the synthesized dataset, authentic captures of blurred QR code images were also utilized for testing purposes. As depicted in Fig. 7, the effect of the proposed algorithm on blurry QR codes is shown.

Fig. 7
figure 7

Blurred QR Code Processing Results: (a) is a blurred QR code, and (b) is the result of our algorithm for deblurring

4.2 Experimental Details

The algorithm proposed in this paper is implemented based on Python and PyTorch deep learning framework and runs in Ubuntu operating system environment. The computing platform used for the experiments is equipped with Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz and NVIDIA Tesla T4 GPU, a total of four Gpus with 15360 M memory per GPU. Experimental verification is carried out on two different data sets: the public data set GOPRO and the QR code data set built by ourselves. In GOPRO dataset, 2103 sets of images were used for training and the remaining 1111 sets were used for testing. However, in the QR code dataset, 1316 groups of images are selected as the training set, and the remaining 680 groups of images are used for the test set. During training, all the training data were randomly cropped and adjusted to images with a resolution of \(256\times 256\). The Batchsize is set to 1, and Adam optimization algorithm [44] is used to optimize the generator and discriminator respectively. The learning rate is initially set to \(10^{-4}\), and after 150 epochs, the learning rate decays linearly to 0. To provide more detailed information on GPU memory usage, at each training moment, we recorded GPU memory usage to better understand the performance of the algorithm in real hardware environments. A total of 300 epochs of the training process were conducted to ensure that the model fully converged and achieved stable performance.

4.3 Experiments on the GOPRO Dataset

The primary objective of this paper is to solve the problem of efficient deblurring in complex dynamic scenes. Traditional methods have limited effectiveness when processing complex motion blur images, so the proposed algorithm is compared in detail with several state-of-the-art methods based on deep learning. To assess the performance of our algorithm, a visual comparison was conducted on the GOPRO dataset. The comparative outcomes are depicted in Fig. 8.

Figure 8 shows the results of processing the blurred image (a) by different algorithms in three different scenarios. From the results, it can be observed that the Sun method [24], which estimates the blur kernel for image restoration, did not yield a satisfactory overall effect. Nah [25] and Kupyn [26] exhibit some artifacts in details like human portraits and text, and larger blurriness is visible in recovered license plate digits. [28] introduces purplish shading in certain regions, while Li [45] achieves impressive results in overall contour restoration, but struggles in recovering details like tile gaps. In comparison, our proposed method demonstrates a remarkable advantage in deblurring capability, particularly in restoring fine textures and preserving edges. This remarkable performance is attributed to the improvements in residual blocks and the novel efficient attention mechanism. The multi-scale residual architecture aids in capturing contours and fine texture features, while our attention mechanism enhances the representation of critical detail features. These advantages are particularly pronounced when dealing with tasks that prioritize the restoration of intricate details, such as the deblurring of QR code images.

Quantitative evaluation is performed on the GOPRO test set,which contains 1111 images, using metrics such as peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM). Table 2 presents the evaluation results of our algorithm compared to various existing methods [22, 24,25,26, 28, 45] and [23]. Based on the data in Table 2, our algorithm demonstrates the most superior performance among all compared methods. When compared to the baseline method [26], our approach elevated the PSNR value by \(5.7\%\) and improved the SSIM value by \(9.7\%\). These outcomes clearly indicate that our method is highly effective in handling dynamic and complex blurry images, achieving accurate deblurring effects.

Fig. 8
figure 8

Comparison of our method with results from other methods on the GOPRO dataset. From (a) to (h): Blurred image, [24,25,26, 28, 45], our method, Sharp image

Table 2 Performance comparison on the GOPRO Large dataset

4.4 Experiments on the QR Code Dataset

The recognition of QR codes is mainly affected by motion blur, and damage to the version, location and format information in the QR codes will directly lead to failure of QR code recognition. This makes the restoration of blurred QR codes pay special attention to texture details, because the degree of restoration of texture details directly determines whether the QR code can be successfully recognized. The algorithm proposed in this paper is compared with several recent motion deblurring methods on a manually produced blurred QR code data set and a blurred QR code data set obtained from actual captures. Figure 9 shows the comparison results between the algorithm proposed in this article and several motion deblurring algorithms [23, 26, 28, 46] on the artificially produced blurred QR code data set, from left to right in Fig. 9a The three images represent slight blur, moderate blur, and severe blur respectively.

Fig. 9
figure 9

Comparison of our method with results from other methods on the Custom QR dataset. From (a) to (f): Blurred image, [23, 26, 28, 46], our method

Firstly, the subjective evaluation method of human eye observation was used. As can be seen from Fig. 9, for slightly blurred QR codes, the five deblurring methods usually achieve effective deblurring results. However, for moderately blurred QR codes, there are more or less some artifacts in the images recovered by the other four methods. Especially for severely blurred QR codes, our method shows the best deblurring results. This is manifested in that the texture of the deblurred QR code is clearer, and most importantly, the content of the deblurred QR code can be successfully recognized.

Next, we adopt two objective evaluation metrics, PSNR and SSIM, to evaluate the performance of the proposed method. We selected 600 self-made blurry QR code pictures for testing, and the results are shown in Table 3.

Table 3 Performance Comparison on the Custom QR dataset

As can be seen from the results in the table, our method still achieves the best results in quantitative comparison, and has the highest degree of recovery of blurred QR codes.

Figure 10 shows the comparison results between the algorithm proposed in this paper and several motion deblurring algorithms [23, 26, 28] on actual captured QR codes with severe motion blur. As can be seen from the figure, our algorithm It performs best in terms of deblurring visual effects and can obtain clearer QR code images than other algorithms. Furthermore, our algorithm shows clear advantages in recovering details and textures.

Fig. 10
figure 10

Comparing our method with others on the actually captured blurred QR dataset. From (a) to (e): Blurred image, [23, 26, 28], our method

In order to further verify the effectiveness of this algorithm on the task of identifying severe motion blurred QR codes, this paper uses the open source Zbar decoding library to conduct a quantitative evaluation, and finally evaluates the merits of the algorithm from the recognition rate and recognition time of QR codes. 600 homemade blurred QR code pictures and 80 actually captured blurred QR code pictures were selected for testing. The blurred QR code without any processing, the blurred QR code processed by [26], and the blurred QR code processed by [28] processed blurred QR code. The blurred QR code processed by [23] and the blurred QR code processed by this algorithm are used for identification. As shown in Table 4, the recognition rate of the unprocessed blurred QR code is extremely low., and the recognition rate of our proposed algorithm is as high as 71.02\((\%)\), which is significantly higher than other algorithms, improving the QR code recognition rate by 387\((\%)\). Additionally, we calculated the average processing time for each QR code. The processing and recognition time of each QR code of the algorithm proposed in this paper is 0.55 s, which is only half of the time used by [26], and the processing speed is 18\((\%)\) faster than [28]. Although the recognition rate of [23] is similar to ours, our processing speed is much higher. Although the processing time is longer than that of the unprocessed case, the significant improvement in recognition rate is even more significant.

Table 4 Comparison of Recognition Rate and Average Running Time

4.5 Ablation Experiments

We performed ablation experiments on the image deblurring network presented in this paper to analyze the contributions of each component of our model. Starting from the initial configuration with only the Multi-Scale Feature Extraction Framework (MSFEF), we progressively introduced the improved efficient attention mechanism (IECA), Multi-Scale Dilated Convolution Blocks (MSDM), and the WGAN-Div loss function. We summarize the results of our study in Table 5.

Table 5 Ablation study on individual components of the proposed our model

Additionally, we compare the scenario of adding only the IECA module within the Multi-Scale Feature Extraction Framework with the scenario of exclusively not adding the IECA module (while keeping other modules added). The research findings demonstrate that the PSNR and SSIM values in both scenarios are remarkably proximate. Nevertheless, upon qualitative examination of the comparative results illustrated in Fig. 11, it becomes evident that the deblurred images obtained with the integration of the IECA module exhibit greater clarity in terms of crucial fine-grained features. These images aptly preserve significant attributes such as human hands while concurrently exhibiting reduced occurrences of artifacts, notably the undesirable arm shadow phenomenon.

Fig. 11
figure 11

Results of the ablation experiments. From (a) to (b): Blurred image, NO IECA module only, IECA module only, Deblurred image

5 Conclusion

The paper proposes an innovative algorithm for motion-blurred QR code recognition by synthesizing generative adversarial networks and attention mechanisms. This algorithm provides an effective solution for applications such as intelligent manufacturing and transportation. A foundational network is constructed by utilizing multi-scale residual blocks and multi-scale feature extraction modules to enhance the perceptual capacity of the deblurring network and enrich the diversity of features. At the same time, the dilated convolution layers are introduced to further expand the receptive field without increasing the network parameters, so that image feature information with a wider range can be effectively extracted and utilized. And then, an efficient attention mechanism is introduced to better capture details and texture information within QR code images. In addition, the WGAN-div loss function is applied to the image deblurring task for the first time, improving the quality of image generation by guiding the generative adversary networks to pay more attention to image details during the training process. Finally, experiments are carried out on the public dataset GOPRO and a custom-crafted dataset of blurred QR codes. The experimental results show that our method performs well in terms of quantification and perception performance, especially when processing QR images affected by severe motion blur, showing clear advantages. From the improvement of the clarity of the QR code to the restoration ability of the detailed texture, our algorithm has outstanding performance in practical applications, greatly improving the recognition accuracy of the QR code and providing a more reliable solution for the decoding and recognition of QR code in scenarios afflicted by motion blur.

Future research will focus on further enhancing the recognition efficiency of the algorithm. At present, in terms of QR code recognition time, our algorithm still has further room for improvement. We think that the lightweight network technology has reference value and can be used to speed up the recognition process. In addition, it is planned to combine the small object detection and localization algorithms with the proposed deblurring algorithm to quickly and accurately locate and recognize the blurred QR codes in a complex environment. This combination will play an important role in application fields such as intelligent logistics, public transportation, and other fields and achieve remarkable results.