Keywords

1 Introduction

Deep neural networks (DNN) and convolutional neural networks (CNN) enjoy high interest nowadays. They have become the state-of-art methods in many fields of machine learning, and have been applied to various problems, including image recognition, speech recognition, and natural language processing [1].

In the area of pattern recognition, deep and convolutional neural networks achieved several human-competitive results [2,3,4]. Concerning these results, there is a question if these methods achieve similar capabilities to human vision, such as a generalization. This paper deals with a property of machine learning models that demonstrates a difference. Let us have a classifier and an image, correctly classified by the classifier as one class (for example an image of a hand-written digit five). It is possible to slightly change the image, so as for human eyes, there is almost no difference, but the classifier classifies the image as something completely else (such as digit zero).

This counter-intuitive property of neural networks was first described in [5]. It relates to the stability of a neural network with respect to small perturbation of their inputs. Such perturbed examples are known as adversarial examples. The adversarial examples differ only slightly from correctly classified examples drawn from the data distribution, but they are classified incorrectly by the classifier learned on the data. Not only they are classified incorrectly, they can often be classified as a class of our choice.

The vulnerability to adversarial examples is not only the case of deep neural network models, but spreads through all machine learning methods, including shallow architectures (like SVMs) or decision trees. Networks with local units, RBF networks, are known to be more robust to adversarial examples. In this paper we examine the way of using RBF layers in deep architecture to protect the architecture from adversarial examples. We propose the new architecture obtained by stacking the deep architecture and an RBF network. We show that such a model is much less vulnerable to adversarial examples than the original model, while its accuracy remains almost the same.

This paper is organized as follows. First, in Sect. 2 we explain how adversarial examples work and review related work. Then, Sect. 3 introduces the new architecture. Section 4 deals with the results of our experiments. Finally, Sect. 5 concludes our paper.

2 Adversarial Examples and Related Work

The adversarial examples were first introduced in [5]. The paper shows that having a trained network it is possible to arbitrarily change the network prediction by applying an imperceptible non-random perturbation to an input image. Such perturbations are found by optimizing the input to maximize the prediction error. The box-constrained Limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm (L-BFGS) is used for this optimization.

On some data sets, such as ImageNet, the adversarial examples are so close to the original examples that they are indistinguishable by human eye. In addition, the authors state that adversarial examples are relatively robust, and they generalize between neural networks with varied number of layers, activations, or trained on different subsets of the training data. In other words, if we use one neural network to generate a set of adversarial examples, these examples are also misclassified by another neural network even when it was trained with different hyperparameters, or when it was trained on a different subset of examples.

Paper [6] suggests that it is the linear behaviour in high-dimensional spaces that is sufficient to cause adversarial examples (for example, a linear classifier exhibits this behaviour, too). The authors propose a fast method of generating adversarial examples (adding small vector in the direction of the sign of the derivation).

Let us have a linear classifier and let x and \(\tilde{x} = x + \eta \) be input vectors. The classifier should assign x and \(\tilde{x}\) same classes as long as \( ||\eta ||_{\infty } \le \varepsilon \), where \(\varepsilon \) is the precision of features.

Consider the dot product between weight vector w and input vector \(\tilde{x}\):

$$ w^{\top }\tilde{x} = w^{\top }x + w^{\top }\eta . $$

Adding \(\eta \) to the input vector, the activation increases by \(w^{\top }\eta \). We can maximize this increase by \(\eta = \varepsilon sign(w)\). If n is the dimension and m is the average magnitude of w, the activation grows by \(\varepsilon mn\). Note that \(||\eta ||_{\infty }\) does not grow with n, but the change in activation caused by perturbation \(\eta \) does grow linearly. It is possible to make many infinitesimal changes to the input that add up to a large change of the activation. Therefore, a simple linear model can have adversarial examples if its input has sufficient dimensionality.

The above observation can be generalized to nonlinear models [6]. Let \(\theta \) be the parameters of a model, x an input, y the targets for x, and \(J(\theta , x, y)\) the cost function. If we linearize the cost function around the \(\theta \), we obtain an optimal perturbation: \( \eta = \varepsilon \;\text {sgn}(\bigtriangledown _x J(\theta , x, y)).\) This represents an efficient way of generating adversarial examples and it is referred to as the fast gradient sign method (FGSM). See Fig. 1 for adversarial images crafted by FGSM on MNIST data set [7] for CNN.

Fig. 1.
figure 1

Original test examples and corresponding adversarial examples crafted by FGSM with \(\epsilon \) 0.2, 0.3, and 0.4.

Other results of fooling deep and convolutional networks can be found in [8]. This paper studies the generation of images looking as noise or regular patterns by evolutionary algorithms. To generate regular patterns the authors use Compositional pattern-producing network (CPPN), that has similar structure as neural networks. It takes indexes (xy) as an input and outputs pixel value. Nodes are functions like Gaussian, sinus, sigmoid, linear. The CPPNs are created by evolutionary algorithms, and the resulting images are regular patterns that are classified as desired images from the training set with high confidence.

In [9] another class of crafting algorithms is proposed, and in [10] a black-box strategy to adversarial attacks is described.

In our paper [11], we examine a vulnerability to adversarial examples throughout variety of machine learning methods. We propose a genetic algorithm for generating adversarial examples. Though the evolutionary search for adversarial examples is slower than techniques described in [5, 6], it enables us to obtain adversarial examples without the access to model’s weights. Thus, we have a unified approach for a wide range of machine learning models, including not only neural networks, but also support vector machine classifiers (SVMs), decision trees, and possibly others. The only thing this approach needs is the possibility to query the classifier to evaluate a given example. See Fig. 2 for adversarial images crafted by our genetic algorithm for CNN.

Fig. 2.
figure 2

Adversarial examples crafted by GA. Images on the first line are all classified as zero by the target CNN, images on the second line as one, etc.

The question of how to make the neural networks robust to adversarial examples is dealt with in [12]. The authors tried several methods, from noise injection and Gaussian blur, using autoencoder, to method they call deep contractive network (that applies a regularization term penalizing large changes of activation in respect to change of input to the cost function). However, the methods cure the adversarial examples only to some extend.

Another attempt to prevent adversarial examples is proposed in [13], based on distillation, i.e. training another network based on outputs produced by target network.

3 Deep Networks with RBF Layers

RBF networks [14,15,16,17,18] are neural networks with one hidden layer of RBF units and a linear output layer.

By an RBF unit we mean a neuron with multiple real inputs \(\varvec{x}=(x_1,\ldots ,x_n)\) and one output y. Each unit is determined by n-dimensional vector \(\varvec{c}\) which is called centre. It can have additional parameter \(\beta > 0\) that determines its width.

The output y is computed as:

$$\begin{aligned} y = \varphi (\xi ); \;\;\;\; \xi = \beta ||\varvec{x}-\varvec{c}||^2 \end{aligned}$$
(1)

where \(\varphi :{\mathbb R}\rightarrow {\mathbb R}\) is suitable activation function, typically Gaussian \(\varphi (z)=e^{-z^2}\).

Thus, the network computes the following function \(\varvec{f}:{\mathbb R}^n\rightarrow {\mathbb R}^m\):

$$\begin{aligned} f_s(\varvec{x}) = \sum _{j=1}^{h} w_{js}\varphi \left( \beta _j \parallel \varvec{x} - \varvec{c_j} \parallel \right) , \end{aligned}$$
(2)

where \(w_{ji}\in {\mathbb R}\) and \(f_s\) is the output of the s-th output unit.

The history of RBF networks can be traced back to the 1980s, particularly to the study of interpolation problems in numerical analysis. It is where the radial basis functions were first introduced, in the solution of the real multivariate interpolation problem [19, 20].

The RBF networks benefit from a rich spectrum of learning possibilities. The study of these algorithms together with experimental results was also published in our papers [21, 22].

With the boom of deep learning the popularity of RBF networks vanishes. However, we show that they can bring advantages when combined with deep neural networks.

Fig. 3.
figure 3

Deep neural network architecture followed by RBF network.

We introduce new deep architecture that is defined as a concatenation of a feedforward deep neural network and an RBF network (see Fig. 3). Let us have a deep neural network DN that realizes a function \(f_{DN}: {\mathbb R}^n \rightarrow {\mathbb R}^m\) and a RBF network RBF that realizes a function \(f_{RBF}: {\mathbb R}^m \rightarrow {\mathbb R}^m\). Then feeding the outputs of DN to inputs of RBF we get a network implementing function: \(f: {\mathbb R}^n \rightarrow {\mathbb R}^m\), where

$$ f(\varvec{x}) = f_{RBF}(f_{DN}(\varvec{x})). $$

For classification tasks we can add softmax activation function to the output layer of RBF network.

The training procedure is the following:

  1. 1.

    train the DN by any appropriate learning algorithm

  2. 2.

    set the centers of RBF randomly, drawn from uniform distribution on (0, 1.0)

  3. 3.

    set the parameters \(\beta \) to the constant value

  4. 4.

    initialize the weights of RBF output layer to random small values

  5. 5.

    retrain the whole network DNRBF (by back propagation).

While the DN part of the network is already trained, it is usually sufficient to train the whole stacked network only for few epochs.

4 Experimental Results

For our experiments we use the FGSM implemented in Cleverhans library [23]. To implement deep neural networks we use Keras [24] and our RBF layer implementation [25]. The scripts used for experiments can be found at [26].

Table 1. Accuracies on legitimate test examples and adversarial examples for MLP and MLPRBF with various initial widths. Average accuracies over 30 runs of learning algorithm.
Table 2. Accuracies on legitimate test examples and adversarial examples for CNN and CNNRBF with various initial widths. Average accuracies over 30 runs of learning algorithm.
Fig. 4.
figure 4

Accuracies on legitimate and adversarial data for MLP and MLPRBF with various initial widths.

Fig. 5.
figure 5

Accuracies on legitimate and adversarial data for CNN and CNNRBF with various initial widths.

We have two target architectures—MLP (two dense hidden layers with 512 ReLU units each, dense output layer of 10 softmax units) and CNN (two convolutional layers with 32 3\(\,\times \,\)3 filters, ReLU activation, 2\(\,\times \,\)2 max pooling layer, dense layer with 128 ReLU units, dense output layer of 10 softmax units).

These two architectures were trained 30 times by RMSProp for 20 and 12 epochs for MLP and CNN respectively. We obtained 98.35% average accuracy for MLP and 98.97% average accuracy for CNN on test data, but only 1.95% (MLP) and 8.49% (CNN) on adversarial data drawn by FGSM from test set.

To each of the 30 trained networks we added the RBF network and retrained the whole new networks for 3 epochs. We found that the results depend on the parameters \(\beta \) of the Gaussians, therefore we tried several initial setups. The best results were obtained with initial \(\beta \) 2.0, and on adversarial data they were 89.21% for MLPRBF and 74.57% for CNNRBF. The complete results can be found in Tables 1, 2 and Figs. 4, 5. It shows that adding RBF network to the deep network may significantly decrease the vulnerability to adversarial examples.

In addition, Table 3 lists the average accuracies on adversarial data crafted with FGSM with different values of \(\epsilon \).

Table 3. Accuracies on adversarial data crafted by FGSM with different \(\epsilon \).

5 Conclusion

In this paper we dealt with the problem of adversarial examples. We have proposed the new deep architecture that is obtained by stacking a feedforward deep neural network and an RBF network. Only a few learning epochs for the whole stacked network are needed to retrain, and to obtain the accuracy close to the accuracy of original deep neural network. We have shown that the new stacked network is much less vulnerable to adversarial examples than the original one.