1 Introduction

The accuracy and efficiency of convolutional neural networks (CNNs) and other deep learning technologies have been unprecedented in recent years, leading to remarkable success in tasks such as image classification, object detection, and segmentation. This success has cemented the status of deep learning as a powerful tool in the computer vision field, and when applied to medical image analysis, it has the potential to change the way medical professionals diagnose and treat various diseases. Deep learning is currently applied to various medical image analysis tasks, including, but not limited to, skin lesion diagnosis [1, 2], tumor segmentation [3, 4], and diabetic retinopathy detection [5, 6]. These advances could be promising for the early detection of disease, reduced medical errors, and lower healthcare costs.

As deep learning emerged as a propitious technology, a problem began to surface, which is that deep learning models are vulnerable to adversarial attacks. It was found in a study [7] that adding a small perturbation to an image that is almost imperceptible to the human visual system can cause a deep learning classifier to completely change its prediction of image classification. The consequences of misdiagnosis or mistreatment due to erroneous medical imaging can be severe, and an adversarial assault can also cause a problem of health insurance fraud. Therefore, the existence of adversarial vulnerabilities poses a serious challenge to the reliability and robustness of deep learning systems deployed in clinical settings.

Numerous scholars have proposed effective algorithms for generating adversarial images [8,9,10,11,12,13]. One of the most notable of these is the 'one-pixel attack’ [14], which can deceive deep learning classification models by introducing subtle perturbations to just a single pixel or a few pixels. This is a particularly intriguing technique because it limits the number of modified pixels, effectively concealing adversarial alterations, and mitigating the issue of perturbation perceptibility. In view of this, our study focused on the one-pixel attack and an assessment of its effectiveness using two different types of medical image: colon pathology and dermatoscopic. According to the experimental results, most medical images are capable of being perturbed into adversarial images.

The overall contributions of this study are as follows;

  • Test the effectiveness of one-pixel attacks on various types of medical images.

  • Test the effectiveness of one-pixel attacks on both small- and full-sized medical images.

  • Test the effectiveness of one-pixel attack on colored and grayscale medical images.

2 Related works

2.1 Classification of adversarial attack methods

Most adversarial attacks can be divided into the following categories based on different attributes;

2.1.1 White-box and black-box

If adversarial attacks are classified by whether or not the structure or parameters of the attacked models are known, they can be divided into white-box attacks and black-box attacks. The intensity of the former will be large because the white-box attack knows the specific structure and parameters of the attacked network model, whereas the attacker in a black-box attack does not know the details of the model, but only its input and output. Although white-box attacks are much more intense than black-box attacks, the latter are more realistic, because it is difficult for the attacker in a white-box attack to directly analyse the target model and carry out the attack.

  • White-Box Attacks: The earliest and most famous white-box attack method was the Fast Gradient Sign Method (FGSM) proposed by Goodfellow et al. in 2014 [8], which entails adding a small perturbation to the input data and adjusting the gradient of the input toward the loss function, causing the model to make incorrect predictions with high confidence levels. The FGSM attack is a simple, but effective attack method that became the basis of many other white-box attack methods. The BIM attack proposed by Kurakin et al., 2016 [9], applies the FGSM attack multiple times and uses smaller perturbations. The PGD attack proposed by Madry et al. in 2018 [10] solves the problem of maximizing the perturbation space. Finally, the DeepFool attack proposed by Moosavi-Dezfooli et al., 2016 [11] finds the smallest adversarial perturbation by linearizing the decision boundary. White-box attacks can also be used to attack models that have been trained with defense mechanisms, such as adversarial training. The Carlini-Wagner (C&W) attack proposed by Carlini and Wagner in 2017 [13] is an example of attacking an adversarial trained model. This is an attack that is optimized to generate adversarial examples with a high success rate, even for adversarial trained models.

  • Black Box Attacks: The two main types of black-box attacks are transfer-based and query-based. Papernot et al. in 2016 proposed a transfer-based attack method where an attacker may train their own substitute model, craft adversarial examples against the substitute, and transfer them to a victim model, with very little information about the victim [15]. Meanwhile, Dong et al. proposed a translation-invariant attack method [16] to generate adversarial examples with better transferability; The Boundary Attack model proposed by Brendel et al. in 2018[17] is an example of a query-based black-box attack. Boundary attacks find the boundaries between different categories of decision regions by iteratively querying the model and perturbing the input data. When it finds the boundary, the attack generates an adversarial paradigm by scrambling the input data toward it. Boundary attacks produce adversarial paradigms of high quality and low query complexity. Yan et al. proposed the subspace attack method [18], which reduces the complexity of the query by limiting the search directions of the gradient estimation by promising subspaces that are spanned by the input gradients of a few reference models.

2.1.2 Targeted and non-targeted

Adversarial attack methods can be broadly categorized based on their objectives as targeted attacks and non-targeted attacks. The primary distinction between these two forms of attacks lies in the determination of the attack's direction. For instance, in a targeted attack, if an image initially belongs to class A and the goal is to manipulate it in such a way that it is misclassified as class B, it is categorized as a targeted attack. Conversely, the aim of a non-targeted attack is simply to have the image recognized as anything other than class A, without specifying a particular alternative class.

2.1.3 Single-step and iterative attack

Attack methods can be classified into two main categories according to their approach: one-shot and iterative. A one-shot attack is distinguished by its speed, but tends to have lower attack strength, such as the Fast Gradient Sign Method (FGSM) [8]. On the contrary, iterative attacks involve multiple attack steps, each of which has a predefined step length, which enhances their effectiveness. The Projected Gradient Descent (PGD) method [10] is a prominent example of an iterative attack, as is the Carlini-Wagner (C&W) attack [13], which continuously optimizes perturbations to achieve specific objectives.

2.2 One-pixel attack

The one-pixel attack was proposed in an article by Su Jiawei et al. in 2019 [14], who suggested that most of the previous adversarial attack methods failed to consider extremely limited scenarios by modifying more pixels than necessary so that the human eye may detect them. Therefore, they proposed a new black-box attack method. When there is only the label probability without knowing the model parameters, the classification model can be misled by perturbing only one pixel using the differential evolution (DE) method [19].

One-pixel attacks have several advantages over other types of adversarial attacks. They require less data and are more flexible. One-pixel attacks only require black-box feedback (probability labels), but no internal information of the target deep learning models, such as gradients or network structures. This makes it more difficult to defend against them because attackers do not need to know how the deep learning model works in order to craft an adversarial image. Additionally, one-pixel attacks can be used to attack a wider variety of deep learning models, including networks that are indistinguishable or where it is difficult to calculate the gradient.

Several detection and defense methods have been proposed against one-pixel attacks on deep learning models. One of the first methods [20] involved denoising the image before feeding it into the classification model to remove tampered pixels. Other methods include trigger detection methods and candidate detection methods [21] designed to identify suspicious pixels. Ensemble methods [22] have also been used to mitigate the vulnerabilities of individual models against adversarial attacks. Adversarial training [23] is another common approach, which involves training models with adversarial samples to enhance their resilience to attacks. A one-pixel attack detection method that uses variational autoencoders recently emerged [24].

2.3 Adversarial attacks on medical images

Adversarial attacks can pose a huge threat in the medical field due to two main factors: economic benefits and technological vulnerability sources [25]. Firstly, some parties in the healthcare system may have a financial interest in manipulating diagnoses and prognoses. As a proportion of the World Health Organization's estimated global healthcare spending in 2013 ($7.35 trillion or €5.65 trillion), the global average healthcare fraud and error losses amounted to 6.19% ($455 billion or €350 billion) [26]. In an investigation, more than 45 million medical images and their patient metadata were found to be exposed and freely accessible, without hacking tools required, on over 2,000 unprotected medical servers across 67 countries, including the United States, United Kingdom, France, and Germany [27]. In recent years, attempts have been made to attack medical images with one-pixel attacks. For example, Korpihalkola and colleagues created adversarial images in 2019 [28] using the Tumor Proliferation Assessment Challenge 2016 (TUPAC16) medical image dataset, and used them to attack the IBM CODAIT's MAX breast cancer detector. As a result, they successfully increased the confidence level of misclassified labels from 1.14% to 84.34%. In another study [29], Korpihalkola and his team also employed adversarial images from the TUPAC16 pathology dataset with the aim of generating adversarial images in which perturbed pixel colors closely resembled the surrounding colors, making it harder to be detect by the human eye. Although the adversarial images generated in this study were visually challenging, some of their attack attempts were unsuccessful. A similar study [30] has been done, which shows that it was difficult for the medical images to survive the pixel attacks, raising the issue of the accuracy of medical image classification and theimportance of the machine learning model’s ability to resist these attacks for a computer-aided diagnosis.

3 Methodology and research framework

3.1 One-pixel attack

If considering that the original image could be represented by an n-dimensional array \(x = ({x}_{1},{x}_{2},\dots ,{x}_{n})\), f is the model chosen to attack, the input of model f is the original image x, from which the confidence level of the category that is x could be obtained, which is f(x). The adversarial image is generated by perturbing the pixel in the original image x. Here, the perturbed pixel is defined as \(e(x) = ({e}_{1},{e}_{2},\dots ,{e}_{n})\) and the limit of the perturbation length is specified as L. Supposing that the class set in the dataset is \(C = \left({c}_{1},{c}_{2},\dots ,{c}_{n}\right),\) the original image belongs to class cori, and it needs to be changed into an adversarial class cadv. cadv and cori \(\in\) C, this can be done using the following equation,

$$\underset{{e\left(x\right)}^{*}}{{\text{max}}}{f}_{{c}_{adv}}(x+e\left(x\right))\;subject\;to\;{\Vert e(x)\Vert }_{0}\le L$$
(1)

In a one-pixel attack scenario, since only one pixel needs to be changed, the value of L is set to 1. The most direct way to find the best solution is an exhaustive search, which involves trying every different pixel in the image. For a \(224\times 224\) RGB image, there will be as many as \(224\times 224\times 256\times 256\times 256=\mathrm{841,813,590,016}\) possibilities, which shows that it is impossible to generate an adversarial image by doing an exhaustive search in real time and that a differential evolution is more effective in simulating an adversarial attack.

3.2 Differential evolution

Differential evolution (DE) [19], a branch of an evolution strategy (ES) [31], is a population-based optimization algorithm for use in solving complex multi-modal optimization problems, which is developed by mimicking the natural breeding process. During each iteration, another set of candidate solutions (children) is generated based on the current population (parents). Then, the children are compared with their corresponding parents and, if they are fitter (possess a higher fitness value) than their parents, they will survive (Fig. 1). In this way, the goals of maintaining diversity and improving fitness values can be achieved simultaneously simply by comparing the parent and his child.

Fig. 1
figure 1

Overview of the differential evolution procedure

As it does not use the gradient information for optimizing, DE can be utilized for a wider range of optimization problems compared to gradient based methods. Using DE to generate adversarial images has the following advantages;

  1. 1)

    More likely to find global optima: DE is a meta-heuristic, which is relatively less subject to local minima than gradient descent or greedy search algorithms.

  2. 2)

    Requires less information of target network: DE does not require the optimization problem to be distinctive. This is critical for the generation of adversarial images since: a) there are networks that are not distinctive and b) much more information about the target network is required to calculate the gradient, which can be unrealistic in many cases.

  3. 3)

    Simplicity: The approach proposed here is independent of the classifier used. It is sufficient to know the probability labels for the attack to take place.

3.3 DE process

3.3.1 Initial populations

The process (Fig. 1) begins with the generation of possible solutions to the problem. Each potential solution is called a ‘gene’. A set of solutions is produced in each ‘generation’, which is the process of the specific ES run. This set of solutions is called a ‘population’. As mentioned above, f is the model to attack for the base image x. In a one-pixel attack, the solution is in the form of an (X, Y, R, G, B) array if the base image is colored, or an (X, Y, \(I\)) rray if it is a grayscale one. X denotes the value of the x coordinate, \(Y\) denotes the value of the y coordinate, \(R\) means the value of the red color channel, \(G\) means the value of the green color channel, \(B\) means the value of the blue color channel, and \(I\) denotes the value of the grey level. The population size is set to 100, which means that there will be 100 adversarial arrays in each generation of the DE. The initial population will be developed randomly, after which a set of parental adversarial arrays will be set \({ARR}^{j}=\left({arr}_{1}^{j},{arr}_{2}^{j},\dots , {arr}_{100}^{j}\right).\) The superscript indicates the number of generations, and the subscript indicates the index.

3.3.2 Mutation

New genes were generated in the mutation process using the following formula,

$${arr}_{i}^{{j}{'}}={arr}_{{r}_{1}}^{j}+F\bullet ({arr}_{{r}_{2}}^{j}-{arr}_{{r}_{3}}^{j})$$
(2)

\({arr}_{i}^{j{'}}\) means that this is the \(j{'}s\) generation array with index \(i\), and the apostrophe denotes the offspring population. \(r\) is a random number that ranges from 1 to the size of the parent population. \(F\) is the mutant factor that ranges from 0 to 1 and decides the strength of the mutation. According to the above formula, the mutant gene firstly comprises a random parental gene \({arr}_{{r}_{1}}^{j}\), and secondly, the difference between the two parental genes \(({arr}_{{r}_{2}}^{j}, {arr}_{{r}_{3}}^{j})\). The mutant factor decides how much the difference between the two random parental genes will affect the “base gene” \({arr}_{{r}_{1}}^{j}\). The offspring population is generated by repeating the above Eq. 100 times. Assuming that this is the generation of \(j\) in the DE process, the generated offspring population is denoted as \({ARR}^{{j}{'}}=\left({arr}_{1}^{{j}{'}},{arr}_{2}^{{j}{'}},\dots , {arr}_{100}^{{j}{'}}\right)\).

3.3.3 Crossover

Since the original one-pixel attack did not include a crossover, it was not used in this work.

3.3.4 Selection

Unlike many other evolution strategies that enable the survival of the next generation of top-performance n genes, DE uses a pairwise survival strategy to select the group of genes that will survive. The selection process will be applied to each pair of parents and offspring, producing two sets of arrays: \({ARR}^{j}\) and \({ARR}^{{j}{'}}\), each of which contains 100 arrays in the form of \(\left(X,Y,R,G,B\right) or\) \((X,Y,I)\). Each array will generate a corresponding adversarial image modified from the original image. Therefore, the algorithm will now have two groups of adversarial images \({X}^{j}=\left({X}_{1}^{j},{X}_{2}^{j},\dots , {X}_{100}^{j}\right)\) and \({X}^{{j}{'}}=\left({X}_{1}^{{j}{'}},{X}_{2}^{{j}{'}},\dots , {X}_{100}^{{j}{'}}\right),\) which will then be input to trained model f in order to generate two sets of confidence level arrays \({CL}^{j}=\left({cl}_{1}^{j},{cl}_{2}^{j},\dots , {cl}_{100}^{j}\right)\) and \({CL}^{{j}{'}}=\left({cl}_{1}^{{j}{'}},{cl}_{2}^{{j}{'}},\dots , {cl}_{100}^{{j}{'}}\right)\). The performance of the adversarial images can be evaluated based on the confidence level. Supposing that the class set in dataset \(C = (c,{c}_{2},\dots ,{c}_{n})\) and the original image belongs to \({c}_{ori}\) class, the confidence level arrays that are generated can be denoted as \({cl}_{i}^{j}=\left({cl}_{{i}_{1}}^{j},{cl}_{{i}_{2}}^{j},\dots , {cl}_{{i}_{100}}^{j}\right), {cl}_{{i}_{k}}^{j}\in \left[0, 1\right], {\sum }_{k=1}^{n} {cl}_{{i}_{k}}^{j}=1\). Each element of each confidence level array corresponds to the confidence level of the class. \({cl}_{{i}_{k}}^{j}\) is how confident the model is that the image belongs to class ck. The groups of adversarial arrays that survive will be selected when the algorithm pairs the confidence-level arrays in an equation. It will compare each group of confidence levels, which means that it will compare \({(cl}_{i}^{j}, {cl}_{i}^{{j}{'}})\) on the \(k\) th position of the confidence level array. Supposing that \({c}_{K}\) is the target class, the target attack aims to maximize the fitness score. This means that the confidence level with the higher value should be reserved. For instance, the parental gene \({arr}_{i}^{j}\) will perform better if \({cl}_{i}^{j}> {cl}_{i}^{{j}{'}}\). On the other hand, the goal of the non-target attack is to minimize the fitness score, which means that the confidence level with the lower value should be reserved. For instance, the parental gene \({arr}_{i}^{j}\) will perform better if \({cl}_{i}^{j}< {cl}_{i}^{{j}{'}}\). Notably, the algorithm will preserve the parental gene when the performance of both genes is similar. Then, this group of preserved genes will be passed to the next step.

3.3.5 Termination

An early stop mechanism is established to determine if the performance is good enough. Based on the above selection process, the algorithm had 100 adversarial arrays corresponding to 100 adversarial images, each belonging to a specific class in class set C. In a non-target attack, the process will be terminated if one image has a class that is different from the original image. On the other hand, the process will be terminated in a targeted attack if one image has the same class as the target class; otherwise, the preserved group of genes will become the new parental initial population, and the DE process will be re-run. The process will also be terminated when the maximum iteration is reached.

3.4 Setting of a one-pixel attack fitness score in a multiclass dataset and multi-label dataset

Only multiclass datasets were used in the original one-pixel attack paper, but one medical image could contain multiple diseases, making a multi-label classification problem. The classifier not only needs to determine if the image is diseased or not, but also needs to identify all the diseases in the image. Recalling that a one-pixel attack uses DE to generate adversarial images, and DE requires a fitness score to assess the performance of the generated images, the target or original class confidence level was used as the fitness score in the original one-pixel attack study. Supposing that the class set in the dataset is \(C=\left({c}_{1},{c}_{2},\dots ,{c}_{n}\right)\), and the original image belongs to class \({c}_{ori}\).\({c}_{ori}\in C\)., when this image is processed by the classifier, it will generate a confidence interval vector\(cl=\left({cl}_{1},{cl}_{2},\dots ,{cl}_{n}\right), {cl}_{i}\in \left[\mathrm{0,1}\right], \sum_{i=1}^{n}{cl}_{i}=1\), and \({cl}_{ori}\) will denote the confidence level of the original class. \({cl}_{ori}\in cl\). If the experiment involves conducting a non-target attack, the goal will be to minimize\({cl}_{ori}\). If it entails conducting a targeted attack and the adversarial images need to become class \({c}_{adv}\) (\({c}_{adv}\in C\)), the goal will be to maximize the confidence level of class \({cl}_{adv}\)\({cl}_{adv}\in C\). The same technique will be used for multiclass datasets in this study, but the algorithm may not look at just one specific class of the multi-label datasets; instead, it will look at multiple class confidence levels at once. If the class set in the dataset is \(C=\left({c}_{1},{c}_{2},\dots ,{c}_{n}\right),\) the image label can be constructed with an array \(l=\left({l}_{1},{l}_{2}\dots ,{l}_{n}\right)\)\({l}_{i} (i\in N, i\in [0,n])\). Each \({l}_{i}\) corresponds to each class \({c}_{i}\) in the class set \(C\) and the value will be 0 or 1. If the value is 0, the image does not contain this class, but if the value is 1, it does. The image can be considered as a multi-dimensional array\(x=\left({x}_{1},{x}_{2},\dots ,{x}_{n}\right)\), and the classifier can be represented by \(f\). When the image \(x\) is input to the classifier\(f\), the \(f(x)\) will produce a set of confidence level array\(cl=\left({cl}_{1}, {cl}_{2},\dots ,{c}_{ln}\right), {cl}_{i}\in \left[\mathrm{0,1}\right], i\in N\). The threshold \(\gamma\) can be set with a range from 0 to 1, where, if \({cl}_{i}>\gamma\), the algorithm will consider it to contain a class \({c}_{i}\) disease. Therefore, if all \({c}_{i}< \gamma\), the image is of a normal patient with no disease. Supposing that an image \({x}_{ori}\) that we want to attack is found with the original class set \({c}_{ori}\). \({c}_{ori}\) is a subset of \(C\) and has a label form \({l}_{ori}\). \({l}_{ori}\) is a subset of \(l,\) by inputting the image \({x}_{ori}\) into the classifier it becomes \(f({x}_{ori})\), and the classifier has successfully predicted the image and produced confidence level \({cl}_{ori}\). As all the elements in the \({c}_{ori}\) needed to be considered at once, cosine similarity is used in this study to construct the fitness score. The generated adversarial image \({x}_{adv}\) will be input into the classifier and become \(f\left({x}_{adv}\right),\) hence producing an adversarial confidence level \({cl}_{adv}\). If this is a non-target attack, the formula will be as follows;

$$similarity\left({{\varvec{l}}}_{{\varvec{o}}{\varvec{r}}{\varvec{i}}},{{\varvec{c}}{\varvec{l}}}_{{\varvec{a}}{\varvec{d}}{\varvec{v}}}\right)=\frac{{{\varvec{l}}}_{{\varvec{o}}{\varvec{r}}{\varvec{i}}}\bullet {{\varvec{c}}{\varvec{l}}}_{{\varvec{a}}{\varvec{d}}{\varvec{v}}}}{{\text{max}}\left(\Vert {{\varvec{l}}}_{{\varvec{o}}{\varvec{r}}{\varvec{i}}}\Vert \Vert {{\varvec{c}}{\varvec{l}}}_{{\varvec{a}}{\varvec{d}}{\varvec{v}}}\Vert ,\delta \right)} , \delta ={10}^{-5}$$
(3)

\(\delta\) is a very small number that prevents the denominator from becoming 0. The goal is to minimize the above formula.

If this is the target attack and the target class set is \({c}_{adv}\), \({c}_{adv}\) is a subset of \(C\) and has a label form \({l}_{adv}\), \({l}_{adv}\) is a subset of \(l\), the formula will be as follows;

$$similarity\left({{\varvec{l}}}_{{\varvec{a}}{\varvec{d}}{\varvec{v}}},{{\varvec{c}}{\varvec{l}}}_{{\varvec{a}}{\varvec{d}}{\varvec{v}}}\right)=\frac{{{\varvec{l}}}_{{\varvec{a}}{\varvec{d}}{\varvec{v}}}\bullet {{\varvec{c}}{\varvec{l}}}_{{\varvec{a}}{\varvec{d}}{\varvec{v}}}}{{\text{max}}(\Vert {{\varvec{l}}}_{{\varvec{a}}{\varvec{d}}{\varvec{v}}}\Vert \Vert {{\varvec{c}}{\varvec{l}}}_{{\varvec{a}}{\varvec{d}}{\varvec{v}}}\Vert ,\delta )} , \delta ={10}^{-5}$$
(4)

The goal is to maximize the above formula. The algorithm transformed the label and confidence level range from [0,1] to [-1,1] as a reminder to calculate the cosine similarity. This is because if the label is all 0 s, the cosine similarity will always be zero, causing the fitness function to fail.

3.5 Research framework

3.5.1 Model selection

Convolution-based neural networks are the most common type of model used to classify images, and the ResNet series [32] is the most widely used today. The MedMNIST paper [33] used both ResNet18 and ResNet50 models, and after analyzing their accuracy (ACC) and area under the ROC curve (AUC) scores, they were found to perform identically. However, the ResNet50 model was chosen for its deeper structure.

3.5.2 Pre-processing images and training model

MedMNIST (small-sized) and its corresponded full-sized datasets were used for this experiment. The datasets were divided into training and testing with a ratio of 8:2. The image size of MedMNIST was adjusted to \(32 \times 32\); the image size of the full-sized dataset was adjusted to \(224 \times 224\).

The first layer of the model was adjusted based on the size of the image and the last layer to have the same number of nodes as the total class number. The model was trained to achieve the target values: (1) Training accuracy > 95%, (2) Test accuracy \(\ge\) The accuracy of MedMNIST – 5%. The benchmark accuracy of MedMNIST is shown in Table 1. The model experiment settings can be found in Section 4.

Table 1 Benchmarks of MedMNIST

3.5.3 Finding base images

The trained model was used to classify the test set images and only those that were correctly classified were retained. In the next step, the DE method was used to determine the best perturbation pixel that could turn these images into adversarial ones.

3.5.4 DE Process

During the DE process, the initial number of candidate solutions (population) was 100 and another 100 candidate solutions (children) were produced at each iteration using the usual DE formula. Candidate solutions (parent and child) and base images were used to generate the adversarial images, which were then fed into the trained model (ResNET50) to obtain the corresponding confidence level. Each candidate solution competed with its corresponding parents based on the population index. If the early stop criterion was met or the maximum number of iterations was reached, this process would stop; otherwise, the winner would survive for the next iteration.

3.5.5 Attack the trained model

When finishing the DE process, we used the generated adversarial images to test our trained model. The research framework is shown in Fig. 2.

Fig. 2
figure 2

Research framework. a Use training dataset to pre-train model (ResNet50). b Only correctly predicted images are selected as base images. c Use DE process to generated one-pixel attacked images. d Test our model using generated adversarial images

4 Experimental setting

All the experiments involved non-target attacks, and each class in each multiclass dataset was subjected to 100 individual experiments. The chosen model was ResNet50 from the Python package torchvision, which is pre-trained.

4.1 Differential evolution

Unlike the original one-pixel attack paper, this DE process had no crossover and the mutant factor was set to 0.5. The population size was set to 100. The maximum number of iterations to run the DE process was limited to 100. It may stop if it met the early stop condition, indicating that the total number of successful images had surpassed 1% of the total number of adversarial images generated in that particular iteration.

4.2 Model

A stochastic gradient descent (SGD) optimizer with a learning rate = 0.001 and momentum = 0.9 was used for this experiment. The main model was ResNet50 [32]. Each model trained 100 epochs with batch size = 64. Early stopping was implemented to avoid overfitting and optimize the training efficiency on condition that the training terminated when the model reached the 95% accuracy threshold.

4.3 Hardware and software

The CPU was 11th Gen Intel(R) Core (TM) i7-11700KF @ 3.60 GHz 3.60 GHz and the RAM was 16 GB. The GPU was NVIDIA GeForce RTX 3060 and the RAM was 12 GB.

The software was the Python package ‘torchvision’. At the same time, the model was pre-trained and adjusted in the first layer based on the size and number of channels. The last layer was also adjusted based on the class number of the dataset. The software specifications were OS: Windows 10 × 64 Education. Programming language: python 3.9.7; Programming module: numpy 1.20.3, torchvision 0.11.3, pytorch 1.10.2, tqdm 4.62.3, scikit-learn 0.24.2, pillow 8.4.0, tiffile 2021.7.2, matplotlib 3.4.3, pandas 1.3.4, pyyaml 6.0, sqlite 3.36.0, plotly 5.8.2.

4.4 Datasets

Small- and full-sized images were both used in these experiments. Two small-sized datasets from MedMNIST were used: PathMNIST and DermaMNIST. MedMNIST is a collection of 10 open medical pre-processed datasets which is standardized to classify lightweight 28 × 28 images. Pathology [34] and Derma [35], which are the source of PathMNIST and DermaMNIST, respectively, are the two full-sized image datasets that were used. All datasets were split into training and test sets with a ratio of 8:2 (Table 2 and 3).

Table 2 Overview of MedMNIST
Table 3 Overview of the dataset used

4.4.1 PathMNIST

These are histological images stained with colorectal cancer hematoxylin & eosin-stained histological images. The total number of images is 100,000. This is a multi-class dataset, the overview of which can be seen in Table 4 below.

Table 4 Overview of PathMNIST and patholoy

4.4.2 Pathology

The Pathology, comes from the study of NCT-CRC-HE-100 K dataset [34], is the source of pathMNIST. It contains 100,000 non-overlapping image patches extracted from stained human cancer tissue slides and normal tissue. The image size of this dataset is \(3\times 224 \times 224\).

4.4.3 DermaMNIST

This is a multi-class dermatoscopic colored image of common pigmented skin lesions. It contains 10,015 images and seven categories. An overview of this dataset is provided in Table 5 below.

Table 5 Overview of DermaMNIST and derma

4.4.4 Derma

The Derma dataset contains a total of 10,015 images that are 3 × 600 × 450 pixels in size [35]. It is the source data of the DermaMNIST. Table 5 shows the distribution of categorical data in the dataset.

5 Experimental results and data analysis

5.1 Test the accuracy of different datasets on trained model (Table 6)

Table 6 Accuracy of different datasets

5.2 Experimental results

The results for each dataset are presented in pairs: the small-sized dataset and its corresponding full-sized dataset. The graphs and tables are listed in the following order,

  • Examples of successful image attacks

    Because 100 experiments were conducted on each class in each dataset, and there were many successful images, only one successful image is presented for each class if the success rate was high. The successfully modified image is shown with a red circle to indicate the modified pixel. There are two lines of description under each image: a bold black line to indicate the original class and a bold red line to show that the class has been transformed. The number in parentheses shows the confidence level of that class. There will particularly be some confidence levels that indicate “ ~ 100%”. This represents a confidence level above 99.94%, because the numbers will be rounded to one decimal point.

  • Success rate

    An attack is considered to be successful if ‘the resulting label is different from the original label.’ This applies to both multiclass and multi-label datasets. For the multi-label dataset, if the resulting label contains the original label but with different classes added or deducted, it will still be a success because the resulting label will be ‘different’ from the original one. The ‘success rate’ is used as the index of the attack performance in this study, and the formula is as follows,

    $$Success\;rate\;=\;\frac{Number\;of\;successful\;adversarial\;images}{Total\;number\;of\;experiments}$$
    (5)
  • Success rate table

    The success rate table shows the success rate for each class in the dataset in percentages. The first row is the class name. The second and third rows are the corresponding attack success rates.

  • Ratio of class transformation table

    One of the purposes of this study was to explore the resistance of images in each class in the dataset. The class transformation table presented above shows the ratio of class transformation of each class. The ratio can be calculated as follows: Supposing that the original class is denoted as \({c}_{ori}\) and the adversarial class is denoted as \({c}_{adv}, {\text{t}}\) here is a total of K classes in the dataset, which make \(ori,adv\in \left[0,K\right], ori,adv, K\in {\varvec{N}}, K>1,\) the number of transformations from class \({c}_{ori}\) to \({c}_{adv}\) is denoted as \({n}_{ori,adv}\). When \(ori=adv\), the attack is considered to be a failure, so that \({n}_{ori,adv}=0\). Then, the transformation ratio of the specific adversarial class should be found. This is denoted as \({T}_{adv}\)., which can be calculated as follows,

    $${T}_{adv}=\frac{\sum_{ori=0}^{K}{r}_{ori,adv}}{K-F}$$
    (6)

    where \({r}_{ori,adv}\) is the ratio of transformation from class \({c}_{ori}\) to \({c}_{adv}\) and is calculated as follows;

    $${r}_{ori,adv}= \frac{{n}_{ori,adv}}{\sum_{adv=0}^{K}{n}_{ori,adv}}$$
    (7)

    Based on the above formula, \({T}_{adv}\) is the average effect for each class in \({c}_{ori}\).

  • Conversion for disease type table

    The number and ratio of data converted from ‘disease to normal” or “normal to disease” are shown in this table. The column indicates whether it is “Disease to Disease’ or “Normal to Disease”. The ‘Count’ column indicates the total images that belong to the conversion type, while the ‘Percentage’ column indicates the ratio of the ‘Count’ column.

5.3 Data analysis of PathMNIST and pathology datasets

5.3.1 Successful attack example of PathMNIST

In PathMNIST’s different class successful attack images (first part), there is one successful adversarial image for each class. The modified pixel is marked by a red circle (Figs. 3 and 4). The black bold line indicates the original class of the image, and the red bold line indicates the adversarial class. If a “ ~ 100%” confidence level is shown, it means that the confidence level is greater than 99.94% because the number was rounded to one decimal point.

Fig. 3
figure 3

PathMNIST successful attack examples (first part) The modified pixel is marked by a red circle. The black bold line indicates the original class of the image, and the red bold line indicates the adversarial class. If a “ ~ 100%” confidence level is shown, it means that the confidence level is greater than 99.94% because the number was rounded to one decimal point

Fig. 4
figure 4

PathMNIST successful attack examples (second part). The modified pixel is marked by a red circle. The black bold line indicates the original class of the image, and the red bold line indicates the adversarial class. If a “ ~ 100%” confidence level is shown, it means that the confidence level is greater than 99.94% because the number was rounded to one decimal point

5.3.2 Successful attack example of pathology (Fig. 5)

Fig. 5
figure 5

Pathology successful attack examples. The modified pixel is marked by a red circle. The black bold line indicates the original class of the image, and the red bold line indicates the adversarial class. If a “ ~ 100%” confidence level is shown, it means that the confidence level is greater than 99.94% because the number was rounded to one decimal point. Notice that the “Cancer-associated Stroma” and “Colorectal Adenocarcinoma Epithelium” are “Disease” classes. These two diseased classes are being transformed to the normal type of class “Debris” and “Normal Colon Mucosa”

5.3.3 Comparison of PathMNIST and pathology

The attack success rate of PathMNIST and Pathology are compared in this section to determine if the size of the image affects the effectiveness of the attack. We noticed that the class “Background” did not have any successful attacks (Table 7). This may be due to the fact that this class bears little resemblance compared to other classes. We also noticed that when images get larger, the attack success rate dropped significantly (Fig. 6).

Table 7 Success rate of PathMNIST and pathology
Fig. 6
figure 6

Pathology attack success rate comparison

Because the success rate of Pathology is too low to have a statistically meaningful effect on the ratio of class transformation, we only list the ratio of class transformation and the class conversion for disease type of PathMNIST (Table 8 and 9).

Table 8 Ratio of class transformation of PathMNIST
Table 9 Conversion of disease type of PathMNIST

5.4 Data analysis of DermaMNIST and derma

5.4.1 Successful attack example of DermaMNIST (Figs. 7 and 8)

Fig. 7
figure 7

DermaMNIST successful attack examples (first part). Pathology successful attack examples. The modified pixel is marked by a red circle. The black bold line indicates the original class of the image, and the red bold line indicates the adversarial class. If a “ ~ 100%” confidence level is shown, it means that the confidence level is greater than 99.94% because the number was rounded to one decimal point

Fig. 8
figure 8

DermaMNIST successful attack examples (second part). Pathology successful attack examples. The modified pixel is marked by a red circle. The black bold line indicates the original class of the image, and the red bold line indicates the adversarial class

5.4.2 Successful attack example of derma (Figs. 9 and 10)

Fig. 9
figure 9

Derma successful attack examples (first part). he modified pixel is marked by a red circle. The black bold line indicates the original class of the image, and the red bold line indicates the adversarial class

Fig. 10
figure 10

Derma successful attack examples (second part). The black bold line indicates the original class of the image, and the red bold line indicates the adversarial class

5.4.3 Comparison of DermaMNIST and derma

The attack success rate, ratio of class transformation, and conversion of disease type of DermaMNIST and Derma are compared in this section to determine if the size of the image affects the effectiveness of the attack.

We noticed that the attack success rate of class Dermatofibroma” increased when images became larger (Fig. 11). By checking the class transformation table (Tables 10, 11, 12 and 13), we found that the class Dermatofibroma in Derma dataset is more easily to transfer into class “Basal cell carcinoma” than in DermaMNIST. This may be due to the fact that “Dermatofibroma” and “Basal cell carcinoma” share some resemblances in model’s perspective. Therefore, we can observe that the success rate of attacks and the size of the images do not have an absolute correlation.

Fig. 11
figure 11

Derma attack success rate comparison

Table 10 Success rates of DermaMNIST and derma
Table 11 Dermatofibroma trans
Table 12 Ratio of class transformation of DermaMNIST and derma
Table 13 Conversion of disease type of DermaMNIST and derma

6 Discussion

6.1 Reasons for choosing the one-pixel attack as a research method

The first reason for choosing the one-pixel attack is its flexibility and effectiveness, which is due to the inherent features of the differential evolution method it uses. Since DE was designed to be a stochastic direct search method, it does not rely on gradient descent. This means that a one-pixel attack can handle a non-differentiable network and requires less computing power. Meanwhile, DE uses a vector population, where the stochastic perturbation of the population vectors can be done independently. This parallel ability enables one pixel to take less time to compute intensive work.

Another reason is that one-pixel adversarial images can be accidentally generated, unlike many other methods that change a lot of pixels in an image to generate adversarial images, which requires time and expertise. For instance, this can happen due to medical device failure or mistakes caused by human carelessness. As a one-pixel attack only perturbs a single pixel, it is difficult for the human eye to perceive it. Therefore, it is important to study this type of attack.

6.2 Comparison of the colored and grayscale image datasets

A colored image consists of three RGB channels, whereas a grayscale image only contains one channel. To determine if the number of channels affects the time required for the experiment and the success rate of the attack, the average DE time was tested using both the color and grayscale datasets from MedMNIST.

According to Table 14, colored images were found to be more susceptible to attacks. The average DE (difference in error) time is nearly the same as grayscale, with both being within two minutes. This suggests that the color channel bandwidth does not significantly affect the DE computation time.

Table 14 Color channel analysis in MedMNIST dataset

6.3 Limitations of this study

Although many experiments were conducted in this study to illustrate that there may be several potential threats when using AI to recognize medical images, there are two major limitations that could be addressed in future research.

  • The labels of the open-source datasets used in this study may not have been completely accurate, which may have affected the experimental results.

  • The differential evolution algorithm contains a crossover step to increase the diversity of the perturbed parameter vectors. Since the original one-pixel attack paper did not include a crossover step, it was not used in this study. The addition of a crossover step could be considered in future studies to ascertain its impact on the experimental results.

6.4 Future research directions

The experiments in this study demonstrated possible weaknesses in several medical image datasets under a one-pixel-type attack. Future researchers can use this study to develop an effective way to defend against this attack, specifically for medical image datasets. The following are some potential strategies to counter one-pixel attacks;

  1. 1)

    Adversarial Training: The employment of adversarial training techniques can enhance the robustness of models by including adversarial examples in the training process. Training the model on both clean and perturbed images increases its resilience to pixel attacks.

  2. 2)

    Input Preprocessing: Input preprocessing techniques, such as denoising, image resizing, or blurring, can help to remove or reduce the impact of perturbations in the input images, making the model more resilient to pixel attacks.

  3. 3)

    Model Ensemble: The overall robustness of the classification system can be improved by using an ensemble of multiple models. Aggregating the predictions of multiple models can minimize the impact of individual pixel attacks.

  4. 4)

    Improve Model Effectiveness: Although a CNN-based model automatically extracts the features from the images, it requires a number of hyperparameters to be set before starting the training process. The choice of these parameters has a significant influence on the design of an effective CNN model. These parameters can be configured by human expertise or trial-and-error. However, as this process can be arduous and time-consuming, it is better to select the parameters optimally to build an optimized and efficient CNN model. Several different variants of differential evolution have been proposed to improve the selection of CNN hyperparameters, such as Dynamically hybrid niching differential evolution (DHDE)[36], and self-adaptive parameter control-based differential evolution (SAPCDE) [37]. The model can be improved by effectively tuning the hypermeters.

  5. 5)

    Apply Blockchain Technology: When defending against adversarial images, the most important task is to protect the images from tampering; therefore, blockchain is an extremely promising solution to enhance medical images’ security and integrity. As a blockchain is immutable, if an attacker was able to modify a block, all the subsequent blocks in the chain would also have to be modified. The computational cost of this with current technology is great enough to deter such an attack [38]. The use of blockchain in the realm of medical imaging can ensure that patients’ data and critical diagnostic information remain tamper-proof and only accessible to authorized individuals. Each medical image can be securely stored and tracked on blockchain's decentralized and immutable ledger, providing a transparent and unalterable record of who accessed the data and when [39]. This not only protects patients’ privacy, but also helps to prevent unauthorized alterations or cyberattacks on sensitive medical imagery, thereby strengthening the trustworthiness of healthcare systems and the accuracy of diagnoses.

7 Conclusion

The experiments conducted in this paper have shown that different types of medical image dataset can be modified into adversarial images by a one-pixel attack without using much computing power and time. The adversarial images could have been the result of no disease to a disease, or vice versa. This indicates that the use of AI or machine learning models to identify medical images and provide medical assistance or advice is extremely risky.

Therefore, when adversarial images are used to attack medical systems, the following moral hazards may be expected;

  • Misdiagnosis: Adversarial images can be used to trick medical diagnostic systems into causing doctors to misdiagnose patients, leading to unnecessary treatment or delayed treatment that can be life-threatening for patients.

  • Unnecessary treatment: Adversarial images can be used to manipulate the health insurance system to cause patients to receive unnecessary treatment, leading to patient distress, higher medical costs, and waste of medical resources.

  • Unreliable Medical Research: Adversarial images can disrupt medical research, making experimental results unreliable. This could have an adverse effect on the evaluation of new drugs, the establishment of disease models, and other medical research.

  • Trust Issues: When patients and healthcare professionals doubt the reliability of machine learning systems, their trust in automated medical decision making could be eroded. This may hinder the widespread adoption of these technologies, which are supposed to improve the efficiency and quality of healthcare.

  • Fraud: Adversarial imagery can manipulate the health insurance system to generate claims.

  • Discrimination: Adversarial imagery can be used to discriminate against specific groups of people. For example, an attacker can generate adversarial images that trick medical diagnostic systems into thinking that a person of a certain race or religion has a certain disease, leading to the unfair treatment of people in this group.

To reduce the moral hazard of adversarial images, the following actions must be taken:

  • Raising awareness of adversarial imagery: Doctors and health insurers need to understand the dangers of adversarial imagery and take steps to prevent it from being used for malicious purposes.

  • Developing detection and defenses against adversarial images: Researchers are constantly developing defenses against adversarial images to protect machine learning models from attacks.

  • Incorporate laws and regulations: Governments can enact laws and regulations that prohibit the malicious use of adversarial images.

All in all, since humans already rely heavily on deep learning as a powerful tool in various fields, it is essential to understand the potential risks of various adversarial images and take measures to prevent them.