Introduction

Although deep neural network based (DNN-based) object detectors [1,2,3] exhibit outstanding performance surpassing that of humans, they are vulnerable to adversarial examples (AEs) [4,5,6,7,8,9]. Attackers generate AEs by adding indistinguishable perturbations to input images. When a detector encounters an AE, it may produce incorrect outputs, thereby revealing its security vulnerabilities. Investigating the mechanisms behind AEs has emerged as a viable approach to promote security of DNN-based detectors and evaluate their robustness.

Universal adversarial perturbation (UAP) [10, 11] is a specific type of AE that differs from generating perturbations for individual images. Instead, attackers train a UAP on a small dataset, and the pre-trained UAP can be directly utilized for robustness test. The term "universal" refers to the input-agnostic of UAP, where the same adversarial perturbation is added during the test to generate an AE regardless of the input image. These characteristics make UAP better matches the reliability evaluation of DNN comparing with other AEs.

In real-world scenarios, there are limitations for adversary to access the black-box model’s parameters. To launch an attack, adversary proposes two strategies: query-based and transfer-based methods. Query-based methods optimize AEs based on the correspondence between input and output. These methods require numerous queries, making them impractial. As a comparsion, transfer-based methods generate AEs using a white-box surrogate model and test them on the black-box model. Attack transferability of AEs generated on a surrogate model declines significantly when tested on a black-box model [12,13,14]. Based on the situation above, enhancing the attack transferability of AEs across different detectors has become a pressing problem on transfer-based methods.

Structural disparities among detectors are the main hindrance to attack transferability across different factors. DNN-based detectors generally consist of backbone networks, feature aggregation layers, detection heads, and post-processing components. Remarkably, even identical components of different detectors can display significant structural differences. As the detection process advances, detectors extract increasingly abstract information from the images. Variations in detection heads and post-processing component lead to notable differences in the abstract information extracted by different detectors. While certain approaches utilize optimization strategies tailored to specific detector components or outputs [14,15,16], these methods may result in sub-optimal AEs when applied to other detectors.

Despite variations in detection heads and post-processing components among detectors, their backbone networks and feature aggregation layers are usually similar or share common structures. For example, ResNet-like [17] networks with residual connections are commonly utilized for feature extraction, followed by pyramid-like structures [18] for aggregating features of different scales. These intermediate features extracted by such structures are found to be universal and non-specific to a particular model [19, 20], meaning that similar features might be identified by other detectors as well. Therefore, optimizing perturbations to disrupt these features can enhance the transferability. However, completely disrupting all intermediate features may hinder attack performance because detectors heavily rely on a few crucial features for detection, while other features are model-specific or redundant [21]. Consequently, the extraction of universal features has become a key challenge. Some methods estimate feature importance by calculating average activation using gradient backpropagation, but they often encounter gradient saturation [22, 23], leading to biased estimation.

Fig. 1
figure 1

The demonstration of deep information bottleneck (DIB)

To address the issue, we introduce the deep information bottleneck (DIB) [24]into AEs and extract intermediate features that significantly impact detection. Figure 1 illustrates the DIB. Ideally, the DIB considers DNN as a Markov-chain, where data X passes through the chain, gets compressed and encoded as intermediate feature Z, and ultimately produces output Y. The DNN strives to extract useful and crucial information from X, compress redundant information in Z, and increase mutual information I(ZY) between Z and the label Y during training. However, information loss inevitably occurs in the process, leading to the following relationship between mutual information among X, Y, and Z:

$$\begin{aligned}{} & {} I(X;Z) \ge I(Z;Y) \end{aligned}$$
(1)
$$\begin{aligned}{} & {} \min \limits _{X,Y} I(X;Z)-\beta \cdot I(Z;Y) \end{aligned}$$
(2)

Equations (1) represents the mutual information between X and Z. The optimization objective of the DNN is to lessen the difference between the values on both sides of the inequality. Equations (2) expresses the optimization of DNN under the perspective of the DIB.

Fig. 2
figure 2

The pipeline of generating DIB-UAP

$$\begin{aligned} \max \limits _{\epsilon }\Vert \epsilon \Vert +\beta \cdot I(Z+\epsilon ;Y) \end{aligned}$$
(3)

We employ DIB to extract crucial intermediate features from the detector. Specifically, we freeze the detector’s weights and introduce variable \(\epsilon \) to the intermediate feature Z, striking a balance between the detector’s accuracy and the variable amplitude. As shown in Eq. (3), maximizing the \(\epsilon \) in the intermediate feature unavoidably reduces the mutual information between \(Z+\epsilon \) and Y, leading to a decrease in detection accuracy. To achieve this balance, the amplitude of \(\epsilon \) differentiates across channels, resulting in decreased amplitudes on crucial features and increased amplitudes on redundant ones. The magnitudes of the amplitudes determine the significance of the features. The extraction process is illustrated on the left side of Fig. 2. We disrupt these crucial features extracted through DIB and name this approach DIB-UAP. The training process of DIB-UAP is depicted on the right side of Fig. 2.

Fig. 3
figure 3

Different UAPs. a Daedalus. b GD-UAP. c Patch unit in DIB-UAP. d DIB-UAP

During the analysis of other UAP methods [10, 11, 25], we summarize a characteristic. The UAP displays local textural patterns, showing a similar appearance with repeated tiled texture. As depicted in the yellow and red boxes in Fig. 3a, b, different parts of the same UAP exhibit similar textures.

The texture characteristic displayed by UAPs is primarily influenced by the structure of detectors. These detectors extract features from images in a hierarchical manner, with the receptive field of neurons expanding as the number of layers increases. As the network deepens and features shrink in size, the image details gradually diminish. Detectors then aggregate information from various receptive fields, thereby obtaining the information of global image. Small objects with more detailed information are generally detected on large-scale feature maps, while medium and large objects are detected on small feature maps.

If the attack target is to generate false positive instances of a specific category, similar textures will be formed within the receptive field region during the optimization process of the UAP. Generating textures based on the scale of the receptive field aligns with the detection mechanism of the detectors. The results indicate that patches generated in this way have strong interference toward small objects but exhibit mediocre performance to medium and large objects. This discrepancy arises from the fact that medium and large objects typically necessitate information from multiple receptive fields for representation within DNN. However, the UAP optimization process lacks sufficient interaction among different receptive fields, resulting in accurate detection of medium and large objects. If the patch size is too small, it results in inadequate interference in the deep features of the network, further affecting the transferability of the UAP. It is imperative to increase the information interaction between different receptive fields in the UAP and maintain a larger size to enhance the UAP’s interference capability against medium and large objects.

We propose the Scale & Tile augmentation method, which utilizes patch units as the basic blocks for tiling, with patch unit sizes close to the receptive field. Additionally, we amplify the patches during the optimization process. This approach aims to enable patches to scale and influence one or multiple receptive fields during iteration, thus enhancing the interference capability against medium and large objects.

The contributions of this paper are summarized as follows:

  1. 1.

    We propose DIB-UAP, an intermediate feature disruption attack method based on deep information bottleneck (DIB). DIB-UAP is a versatile and generic attack method that can be applied to various object detectors, including single-stage and two-stage, without limitations imposed by detection heads or post-processing components.

  2. 2.

    In order to improve the attack transferability against medium and large objects, we propose the Scale & Tile (S &T) data augmentation method. This approach involves scaling and tiling during the training of DIB-UAP, leading to a significant enhancement in its effectiveness for attacking medium and large objects.

  3. 3.

    We evaluate the transferability of DIB-UAP with eight comparison methods across four mainstream black-box models. The experiments demonstrate that DIB-UAP outperforms the second-rank methods by 5.2% in attack transferability. Additionally, we conduct experiments on a commercial platform to evaluate its real-world feasibility.

Related works

Our work involves three aspects: DNN-based object detector, adversarial examples, and deep information bottleneck theory.

DNN-based object detector

DNN-based object detectors can be categorized into single-stage and two-stage models based on the way of generating region proposal.

Faster R-CNN [2] serves as a representative algorithm among two-stage object detectors. In the first stage, the region proposal network (RPN) generates multiple region proposals on the feature map and filters out no-object proposals, effectively reducing number of region proposals and computational complexity. In the second stage, the filtered region proposals undergo fine-tuning of their coordinates, followed by determination of their respective classes. Numerous subsequent methods have further enhanced Faster R-CNN through techniques such as pyramid networks [14], cascading networks [26], negative sampling [27], and multi-scale training [28, 29], resulting in further accuracy improvements. Due to their robustness and accuracy, Faster R-CNN and its subsequent improvements remain widely applicable in various practical scenarios.

The YOLO series algorithms [30,31,32,33,34] are exemplary methods in single-stage object detectors. They divide up the feature map into \(S \times S\) grids and make predictions for proposal coordinates, object class, and confidence in each grid. The confidence represents the probability of a proposal including an object. Subsequent research has consistently improved YOLO in backbone network [33] and detection head [34]. Owing to its simplicity and high accuracy, YOLO algorithms are widely adopted in practice, exceeding expectations even in tasks beyond object detection [35, 36]. Other common single-stage detection models such as RetinaNet [37] and SSD [38] adopt the same approach as YOLO in direct prediction on the feature map.

With the continued improvement of object detection tasks, numerous detectors have been proposed, resulting in a relatively high level of detection accuracy. Considering their varied real-world applications, we choose Faster R-CNN, YOLO X, RetinaNet, and SSD as black-box victim models to test attack transferability.

Adversarial examples

Szegedy et al. [4] define adversarial example (AE) as input data with imperceptible perturbations, leading to incorrect outputs from the DNN. Xie et al. [39] propose AEs specifically for object detection, optimizing perturbations to manipulate the model’s classification results. Wang et al. [14] design AE called Daedalus for the NMS post-processing component. Wei et al. [25] discover the self-universality of adversarial perturbations, generating transferable AE by approximating intermediate features between local and global perturbations.

While AEs generated on white-box surrogate models can also attack black-box victim models, their transferability significantly decreases. Consequently, several methods have been proposed to enhance AE’s transferability, including image augmentation [40, 41] and model ensembles [41, 42].

Initially, adversarial perturbation was tailored for single images, but due to its reliance on data, these perturbations are impractical. To tackle the problem, Moosavi-Dezfooli et al. [10] proposed universal adversarial perturbation (UAP), which generates perturbations that can launch attacks on any input image. UAP’s input-agnostic characteristic makes it suitable for a wider range of scenarios. Motivated by [10], researchers conduct more in-depth works on UAP. Wu et al. [12] introduce the generic universal adversarial perturbation (G-UAP) into object detection tasks, attacking the foreground and background classification results of the RPN network. Mopuri et al. [13] propose the Generalizable Data-free universal adversarial perturbation (GD-UAP), which achieves generalizable across multi vision task by disrupting features extracted at intermediate layer. Feature Disruption UAP (FD-UAP) [43] focuses on the model-specific features. It divides the intermediate features into significant and less significant features in channel-wise through mean-based strategy and non-zero count-based strategy. By simultaneously weakening the significant features and strengthening the less significant features, FD-UAP gains powerful transferability. Feature Gathering (FG-UAP) [44] exploits the phenomenon of neural collapse in DNNs and targets the layers prone to neural collapse, optimizing UAP in layers with low intra-class variance. This makes the perturbations more likely to cross decision boundaries, thereby enhancing the transferability. Liu et al. [45] proposed stochastic gradient aggregation UAP (SGA-UAP) which employs small-batch training to conduct multiple iterations of inner pre-search. Subsequently, all inner gradients are aggregated as a one-step gradient estimation to enhance gradient stability.

Compared to traditional adversarial perturbations, UAP offer a broader range of attack scenarios. However, the significant variations in detection head and post-processing components among different detectors significantly impede the transferability of UAP. In contrast, DIB-UAP is specifically designed to disrupt the intermediate features extracted by the backbone, which is more universally shared across detectors, thus providing greater universality and enhancing transferability. We compare DIB-UAP with other feature disruption methods and find that methods attack all features does not surpass the performance of DIB-UAP, which selectively targets a subset of crucial features. This finding verifies the potential drawbacks of disrupting all features, as it may lead to suboptimal results.

Deep information bottleneck

The information bottleneck originates from the Rate-Distortion Law in information theory. It functions as a quantitative analysis approach that balances the trade-off between source encoding compression and fidelity in learning tasks. Its formal representation can be summarized as follows:

$$\begin{aligned} L[p({\tilde{x}} \vert x)]=I({\tilde{X}};X)-\beta \cdot I({\tilde{X}};Y) \end{aligned}$$
(4)

\({\tilde{X}}\) denotes the compressed coding.

Deep information bottleneck (DIB) [24] extends the information bottleneck theory [46], which proposes that optimizing DNN involves information bottleneck processing. Its objective is to compress input data while preserving its valuable information. DIB has broad applications, such as data compression [47], out-of-distribution detection [48], and causal intervention [49, 50]. In DIB-UAP, we utilize DIB to extract more generalizable intermediate features and generate UAP to disrupt these extracted features, thereby boosting the transferability of adversarial attacks.

Method

Background

After inputting the image into the detector, it generates a vector \(P=\{p_{i}:(d_{i},o_{i},c_{i});i=1,2,...,n\}\), which represents the object proposals within the image. Here, \(d_{i}\) represents the coordinate of i-th proposal, \(o_{i}\) indicates the confidence score of \(p_{i}\) reflecting whether it contains objects, and \(c_{i}\) denotes the class probability. The values for both \(o_{i}\) and \(c_{i}\) fall within the range of 0 to 1.

Detectors often produce dense predictions for the same object and typically employ non-maximum suppression (NMS) to eliminate redundancies. In the case of multiple bounding boxes for the same object, NMS calculates the Intersection over Union (IoU) overlap between the box with the highest confidence and the others. If the IoU value surpasses the predefined threshold, it indicates overlapping detection results between the two boxes, subsequently eliminating the box with lower confidence. Conversely, if the IoU value falls below the threshold, the two boxes represent distinct objects. Then, the most probable class for the object within the box is outputted. The calculation method for IoU is depicted in Fig. 4.

The attack goal is to optimize UAP \(\delta \) on white-box surrogate detector \(f_{s}\) in a way that ensures, for any given input \(x \in X\), it holds true on black-box victim detector \(f_{v}\):

$$\begin{aligned} f_{v}(x+\delta ) \ne y \end{aligned}$$
(5)

To uphold visual imperceptibility, we constrain the magnitude of \(\delta \) to be within 20/255, aligning with the criteria followed by other approaches.

Fig. 4
figure 4

The calculation of Intersection over Union (IoU)

Extraction via DIB

To a well pre-trained detector, we consider its weights to be optimal for the given dataset. After freezing the detecot’s weights, we introduce the variable \(\epsilon \) to the features of various channels in the backbone network. Subsequently, the surrogate detector’s performance may decline. Our objective is to preserve the detector’s accuracy and keep the summation magnitude of \(\epsilon \).To achieve the goal, we allow for differentiation of \(\epsilon \) across channels while maintaing the overall amount constant. It is worth noting that the amplitude of \(\epsilon \) gradually decreases on channels with universal information, whereas it increases on redundant channels. We identify channels with lower amplitude as crucial features and disrupt them. The processis shown in Eq. (3). In our implementation, we translate the optimization objectives in Eq. (3) into two distinct losses, thereby converting Eq. (3) into Eq. (6). In Eq. (6), \(L_{\epsilon }\) controls the amplitude, while \(L_{Det}\) helps maintain model accuracy.

$$\begin{aligned} \max \limits _{\epsilon } L_{\epsilon } - L_{Det} \end{aligned}$$
(6)

We set the \(L_{\epsilon }\) to optimize the amount of \(\epsilon \) at a specific layer. Assuming the feature dimension extracted by this layer is \(B\times C\times H\times W\), then the dimension of \(\epsilon \) is \(1\times C\times 1\times 1\). \(\epsilon \)’s amplitude is controlled by the v, which is initially set to 1. \(normal(\mu =1, \sigma =1)\) denotes a normal distribution with a mean and variance of 1. It can be expressed as follows:

$$\begin{aligned} \epsilon= & {} ReLU(v) \cdot normal(\mu =1, \sigma =1) \end{aligned}$$
(7)
$$\begin{aligned} L_{\epsilon }= & {} (\epsilon -1)^2/C \end{aligned}$$
(8)

Our attack goal, focused on increasing false positive objects, thus consists of objectness loss \(L_{cls}\) and confidence loss \(L_{obj}\). In the experiment, we utilize YOLO v3 [32] as the surrogate model, with output \(P=\{p_{n}:(d_{n},o_{n},c_{n});n=1,2,...,n\}\). Therefore, \(L_{cls}\) and \(L_{obj}\) are determined as Eqs. (9) and (10):

$$\begin{aligned} L_{cls}= & {} \sum _{i=0}^{S^{2}}\sum _{j=0}^{B} I_{i,j}^{obj}\sum _{c\in clsses}p_i(c)log(p_{i}(c)) \end{aligned}$$
(9)
$$\begin{aligned} L_{obj}= & {} \lambda _{noobj}\sum _{i=0}^{S^{2}}\sum _{j=0}^{B} I_{i,j}^{noobj}(o_{i}-\hat{o_{i}})^2\nonumber \\{} & {} +\lambda _{obj}\sum _{i=0}^{S^{2}}\sum _{j=0}^{B} I_{i,j}^{obj}(o_{i}-{{\hat{o}}}_{i})^2 \end{aligned}$$
(10)
$$\begin{aligned} L_{Det}= & {} L_{cls} + L_{obj} \end{aligned}$$
(11)

\(\lambda \) represents the loss coefficient and remains consistent with the values mentioned in the YOLO v3. When the bounding-box at \(grid_{(i,j)}\) contains the object, the value of \(I_{i,j}^{obj}\) is set to 1; otherwise, it is set to 0. \(p_{i}(c)\) denotes the probability of the object belonging to class c, and \({{\hat{p}}}_{i}(c)\) represents the true label of the class. \(o_{i}\) denotes the confidence score and \({{\hat{o}}}_{i}\) is the true label.

Optimization

By utilizing the DIB, we extract crucial features and disrupt them. Our goal is to obtain UAP \(\delta \) that meet the following criteria:

$$\begin{aligned} arg \max \limits _{\delta } \sum _{x \in X} \lambda _{F}\ \cdot L_{F}(x+\delta )+L_{adv}(x+\delta ) \end{aligned}$$
(12)

The DIB-UAP’s optimization is decided by two losses: \(L_{F}\) and \(L_{adv}\). \(L_{F}\) is specifically designed to disrupt the selected intermediate feature, while \(L_{adv}\) is a loss tailored for the detection component.

$$\begin{aligned} L_{F} = e^{-\vert f_{s}(x+\delta ;n)-f_{s}(x;n)\vert } \end{aligned}$$
(13)

Equation (13) describes the \(L_{F}\), which aims to amplify the disparity between the DIB-UAP and the benign input in the intermediate feature of the n-th channel. Here, n represents the top n channels with the smallest values in the v obtained in the previous stage.

$$\begin{aligned} L_{adv} = -\text {log}(o \cdot c) \cdot h \cdot w / (H \cdot W) \end{aligned}$$
(14)

Many attack approaches [14,15,16, 39] have focused on the classification branch of detector, achieving significant transferability in attacks. Following their approach, we specifically design \(L_{adv}\) for this branch. The object’s class is determined by two variable: the confidence score o and the targeted class probability c. Only when the o exceed a pre-defined thresold, the proposals are recognized as foreground objects; otherwise, they are classified as background. Once identified as foreground objects, their classification probability is calculated as the \(o \cdot c\).

Fig. 5
figure 5

\(L_{adv}\) and its derivative

We optimize \(o \cdot c\) through \(L_{adv}\) to bring it closer to 1. However, most candidate bounding boxes have low o, making optimization challenging. Therefore, we amplify its gradients using the logarithmic function. This amplification provides a largaer gradient for the smaller values of \(o \cdot c\), accelerating the convergence of the optimization process. Figure 5 illustrates the \(L_{adv}\) and its derivative.

Furthermore, we optimize the height h and width w of the proposals. This optimization aims to minimize the bounding box size, thereby reducing the IoU between the boxes. As a result, false positive outputs are generated, leading to a decrease in detection accuracy.

Scale & Tile

The current UAPs exhibit localized texture patterns that are relatively small. As the detection proceeds, the interference of these small textures on deep features decreases. This results in a decline in the attack transferability of UAPs against medium to large-sized objects, as detectors carry out detection on varying scales of features, e.g. detecting medium and large objects on small feature maps.

Fig. 6
figure 6

The process of Scale & Tile

Hence, we enhance the attack transferability of DIB-UAP against medium to large-sized objects by enlarging the size of its localized texture patterns. However, achieving control over the size of pattern through loss function presents a challenge. Thus, we propose utilizing data augmentation techniques to achieve the goal. By leveraging the observed characteristics of UAP texture appearance, we train DIB-UAP on a patch unit basis.

Our approach can be summarized as "Scale & Tile", as shown in Fig. 6. Assuming the training images are resized to \(M*M\), with an initial patch unit size of \(m*m\), we multiply the unit size by a factor of n and use the enlarged \(mn*mn\) units to tile the image. Specifically, we calculate the size multiple \(k=\lceil M//m \rceil \) between the image and the initial patch unit. Within the range of [0, k], we find the greatest common factor Q and its set of divisors N. An integer n is randomly selected from N as the scale factor for the patch unit. Next, we use the enlarged \(mn*mn\) unit to generate patch of size \(mQ*mQ\). For the remaining areas, we utilize the original patch unit to fit \(M-mQ\) and continue the tiling process until all training images are covered by patch unit. This approach ensures the minimum size of patch unit in DIB-UAP while achieving multi-scale training and improving the attack performance of DIB-UAP on objects of different sizes. During testing, we tile the patch units on one side k times and extract an area of size \(M*M\) as the DIB-UAP.

Experiments

In this section, we introduce the experimental setting and results of DIB-UAP.

Experimental settings

Detectors We utilize several widely employed object detectors in the experiment, namely YOLO v3 [32], SSD300 [38], RetinaNet [38], Faster R-CNN [2], and YOLO X-small [34]. Among these detectors, we specifically opt for YOLO v3 as the white-box surrogate detector to train DIB-UAP and assess the transferability on other detectors. Table 1 presents the detector structures and their precision on the VOC2012 and COCO2017 dataset.

Table 1 The detectors in experiment
Table 2 The detectors in experiment
Table 3 Results on black box detectors

Dataset and evaluation metrics We train the DIB-UAP on the VOC2012 train set, consisting of 5717 images and 20 classes. In real-world scenarios, attackers cannot access the black-box detectors’ data domain, which is usually much larger than the data used by attackers. To better align with practical situations, we evaluate the attack performance on both the VOC2012 and COCO 2017 validation set. The COCO 2017 dataset contains 118K training images and 5k validating images, surpassing the scale of the VOC2012.

Table 4 Effect of different components

In object detection, mean Average Precision (mAP) is a widely used evaluation metric of detector’s accuracy. The accuracy is typically determined through two metrics: classification and localization. When the classification is correct, the accuracy of object localization is assessed by Intersection over Union (IoU) between the detection box and the ground truth box. In the best-case scenario, an IoU value of 1 signifies perfect alignment between the predicted bounding box and the ground truth. However, achieving such precise alignment is extremely challenging. As a result, various IoU thresholds are used to evaluate the accuracy of object localization. Typically, a minimum IoU threshold of 0.5 (mAP@0.5) is employed, which is also commonly utilized in the VOC2012 dataset.

Fig. 7
figure 7

The impact of different patch unit sizes

Hence, we measure the transferability by examining the reduction in mAP@0.5. Additionally, to analyze the impact of unit size on transferability across objects scales, we compare the mAP for different object sizes, represented as mAP\(\mid \)small, mAP\(\mid \)medium, and mAP\(\mid \)large, corresponding to target areas smaller than \(32^2\), between \(32^2\) and \(96^2\), and larger than \(96^2\), respectively.

Our implementation and comparative methods In our implementation, the DIB-UAP has an initial size of 128 \(\times \) 128. The batch size is set to 16, and coefficient \(\lambda _{F}\) is set to 1. During training, we utilize Scale & Tile to adjust the perturbation size based on the input image. We use the Adam optimizer with an initial learning rate of 0.03. The learning rate is decayed by 0.97 when the loss changes by less than 1e-4. The training comprises 15 epochs.

The comparative methods consist of G-UAP [12], GD-UAP [13], Daedalus [14], DAG [39], SU [25], FG-UAP [44], FD-UAP [43] and SGA-UAP [45]. G-UAP, GD-UAP, FG-UAP, FD-UAP and SGA-UAP are UAP-based methods, while Daedalus, DAG and SU are input-specific adversarial perturbation. We implement Daedalus, DAG and SU within the UAP framework by optimizing the UAP with their respective loss function. All methods are generated using YOLO v3 as the surrogate detector. To preserve more proposals, the confidence score threshold is set to 0.1 and the IoU threshold is set to 1 during optimization. Detailed methodology can be found in Table 2. Detailed method descriptions and comparisons are provided in “Experimental result”.

Experimental result

We compare the DIB-UAP and other methodologies. The experimental results are shown in Table 3.

We conduct a comprehensive comparison of all methods on four black-box detectors. On the VOC2012 dataset, DIB-UAP achieves an average decline that exceeds the second-ranked FD-UAP method by 5.2%. Similarly, on the COCO2017 dataset, DIB-UAP outperforms the second-ranked SU method by 1.5%. Importantly, the UAP generated on the VOC dataset also demonstrates impressive results on the COCO dataset, suggesting the cross-domain transferability for all methods. The relative gaps between different methods on the two datasets are similar, hence our primary focus is on presenting the results obtained from the VOC dataset.

DIB-UAP achieves a good attack transferability regardless of the backbone network employed for detector, such as VGG16 and ResNet50. It performs comparably to Daedalus FG-UAP, SGA-UAP and FD-UAP, while other methods like G-UAP, GD-UAP, SU and DAG exhibit inferior transferability. We will discuss each method separately.

G-UAP and DAG are approaches that solely target the classification branch for attack. The results reveal their poor performance, indicating that substantial structural disparities between detectors can render attacks ineffective under black-box conditions when attacking a specific model architecture. FG-UAP targets susceptible network layers prone to neural collapse, showing significantly improved transferability compared to G-UAP and DAG. However, it still does not match the effectiveness of feature disruption methods.

SGA-UAP shows transferability across heterogeneous detectors by pre-searching stable optimization. However, the frequent implementation of pre-searching diminishes the transferability of SG-UAP to detectors with similar structures as the surrogate detector, as evidenced by its transferability to detectors like YOLO X.

Table 5 Effect of different attack layers
Fig. 8
figure 8

Detection results of benign images and DIB-UAP. a, c Benign images; b, d DIB-UAP

Fig. 9
figure 9

Detection results on the commercial platform. a, d Benign images; b, e commercial detection platforms \(\#1\); c, f commercial detection platforms \(\#2\)

GD-UAP specifically focuses on disrupting the ResNet50 network. However, its performance is also subpar. One plausible reason is that different backbone networks might be influenced by subsequent detection modules, leading to variations in extracted features. These features could contain a higher amount of redundant information, resulting in suboptimal outcomes when subjected to attacks.

The YOLO X-s exhibits poorer robustness due to its lower parameter count and structural similarity to the surrogate detector YOLO v3. Hence, all methods achieve high attack success rates on this detector.

Daedalus is specifically designed for the NMS component and achieves good performance on single-stage models like SSD and RetinaNet. However, its performance is poor on the two-stage model, Faster R-CNN. Since Faster R-CNN generates final detection results through two stages of processing, optimizing only one stage of the detector may not effectively impact the other stage, resulting in a decrease in attack performance. As a result, Daedalus, which attacks the NMS located at the end of the detection pipeline, exhibits a significant decline in performance on Faster R-CNN compared to single-stage detectors.

SU primarily focuses on attacking the features and aims to approximate the similarity between local and global images in the feature space. It performs better on Faster R-CNN compared to Daedalus. However, its method of computing similarity between local regions and global features does not work well for medium to large-sized objects, resulting in lower transferability compared to DIB-UAP.

Like DIB-UAP, FD-UAP disrupts features extracted from the backbone network at the channel level. Nevertheless, its transferability to Faster R-CNN and YOLO X detector is lower than that of DIB-UAP. This suggests that the statistically-based method for identifying significant features may not be suitable for all detector architectures.

In “Ablation experiments”, we will compare the impact of different patch unit sizes of DIB-UAP on medium and large objects.

Ablation experiments

In this section, we conduct ablation experiments to examine the effects of different loss functions, S &T and coefficients \(\lambda _{F}\). Furthermore, we explore how varying patch unit sizes of S &T affect medium and large targets. We also investigate the transferability of attacks on different feature layers. To compare the results, we conduct experiments using Faster R-CNN as the black-box detector on the VOC dataset.

Study on different components and \(\lambda _{F}\)

We analyze the impact of different loss functions and S &T augumentation, coefficients \(\lambda _{F}\) on the transferability of DIB-UAP. The results are demonstrated in Table 4.

When optimizing solely with \(L_{adv}\), DIB-UAP experiences a 17.1% decline in mAP. The results from Table 4 highlight that the adoption of S &T augumentation significantly enhances transferability, leading to an overall increase of 11.7%. These findings indicate that different components contribute to attack transferability gains.

Additionally, incorporating \(L_{F}\) for optimization further boosts the transferability to 39.0%. When \(\lambda _{F}\) is small (1e−3), there is minimal improvement in transferability. The \(\lambda _{F}\) of 1e−2 yields the highest transferability value. Increasing the \(\lambda _{F}\) beyond that point results in a decline in performance. Thus, we set the \(\lambda _{F}\) to 1e−2.

Study on Scale & Tile

We perform ablation experiments on the patch unit size in S &T to analyze its influence on transferability. We individually train patches of sizes {64, 128, 256} and compare their transferability on objects of various sizes.

The results are presented in Fig. 7. The accuracy of Faster R-CNN reaches 80.2%. Without S &T, DIB-UAP leads to a decline of 28.3% (80.2–51.9%) in the mAP@0.5. This decline is most significant for small and medium objects, with reductions of 19 and 28.3% respectively. In the case of large objects, the decrease is only 18.2%, representing the smallest relative decline. When the patch unit is set to 64 \(\times \) 64, there is minimal decline in mAP@0.5, and the mAP decrease for medium and large objects is insignificant. However, when the patch unit sizes are set to 128 \(\times \) 128 and 256 \(\times \) 256, the accuracy on large objects experiences a decrease of over 8% compared to the 64 \(\times \) 64. As a result, there is a substantial improvement in transferability on mAP@0.5. This demonstrates the effectiveness of S &T in medium and large objects. Based on the analysis of Fig. 7, we determine the patch unit size in S &T to be 128 \(\times \) 128.

Study on attack layer

The term "attack layer" refers to the layer where the training variable \(\epsilon \) is injected during the feature extraction stage of DIB-UAP. In the second stage, we disrupt the features extracted from this layer. The selection of the attack layer has a significant impact on the attack transferability of DIB-UAP. Therefore, we conduct ablation experiments to investigate it. In these experiments, we study on the features extracted by the backbone network. Assuming the input image has a size of M \(\times \) M, the feature maps have sizes of {M/8, M/16, M/32}.

Table 5 demonstrates the results, illustrating the impact of different feature map sizes on transferability. As the size gradually decreases, the attack transferability also decreases progressively. It indicates that disrupting larger feature maps, which contain more object information, can yield superior results.

Additionally, we analyse the numbers of attack channels (n) on the transferability. The results demonstrate a decline in transferability as n increases. This observation is consistent with the principle underlying DIB, where detectors rely on a few critical features for detection, while other features are either model-specific or redundant. Attacking these features not only fails to improve transferability but also hinders it.

In accordance with the results in Table 5, we set the patch unit size as M/8 and the number of attack channels as n=10.

Detection results

Figure 8 illustrates the detection results of DIB-UAP on Faster R-CNN. It is evident from the figure that the detector generates erroneous results, encompassing false positives, false negatives, and missed detections.

To validate the practicality of DIB-UAP, we conduct tests on two of the latest commercial detection platforms. The results, depicted in Fig. 9, exhibit a noticeable presence of false positive human objects. It illustrates DIB-UAP’s transferability on unknown parameters and the structure of black-box detectors in real-world scenarios. For a more detailed examination of the test results, please refer to URL https://github.com/comea23/DIB-UAP. Additionally, we have made DIB-UAP publicly available for comparison by other researchers.

Conclusion

To tackle the issue of limited transferability in UAP attacks caused by variances in the structure of DNN-based detectors, we propose the DIB-UAP. DIB-UAP operates at the feature level, utilizing deep information bottleneck (DIB) to extract crucial features that impact detection results and generating UAP specifically designed to disrupt these features.

Furthermore, we analyze the visual characteristics of other UAP methods and find that their relatively small local textures are the main reason for their limited effectiveness in attacking medium to large-sized objects. Consequently, we introduce the Scale & Tile augmentation method to address this limitation. By optimizing patch units at multiple scales and tiling them, we expand the texture size within DIB-UAP. This augmentation allows for greater disruption in lower-sized features, thereby improving its transferability to medium-to-large objects.

Experimental results confirm the remarkable transfer attack performance of DIB-UAP on various detectors. We hope that our approach can contribute to enhancing the robustness of DNN-based detectors.