Abstract
We propose a simple way to increase the robustness of deep neural network models to adversarial examples. The new architecture obtained by stacking deep neural network and RBF network is proposed. It is shown on experiments that such architecture is much more robust to adversarial examples than the original one while its accuracy on legitimate data stays more or less the same.
Similar content being viewed by others
Keywords
1 Introduction
Deep neural networks (DNN) and convolutional neural networks (CNN) enjoy high interest nowadays. They have become the state-of-art methods in many fields of machine learning, and have been applied to various problems, including image recognition, speech recognition, and natural language processing [1].
In the area of pattern recognition, deep and convolutional neural networks achieved several human-competitive results [2,3,4]. Concerning these results, there is a question if these methods achieve similar capabilities to human vision, such as a generalization. This paper deals with a property of machine learning models that demonstrates a difference. Let us have a classifier and an image, correctly classified by the classifier as one class (for example an image of a hand-written digit five). It is possible to slightly change the image, so as for human eyes, there is almost no difference, but the classifier classifies the image as something completely else (such as digit zero).
This counter-intuitive property of neural networks was first described in [5]. It relates to the stability of a neural network with respect to small perturbation of their inputs. Such perturbed examples are known as adversarial examples. The adversarial examples differ only slightly from correctly classified examples drawn from the data distribution, but they are classified incorrectly by the classifier learned on the data. Not only they are classified incorrectly, they can often be classified as a class of our choice.
The vulnerability to adversarial examples is not only the case of deep neural network models, but spreads through all machine learning methods, including shallow architectures (like SVMs) or decision trees. Networks with local units, RBF networks, are known to be more robust to adversarial examples. In this paper we examine the way of using RBF layers in deep architecture to protect the architecture from adversarial examples. We propose the new architecture obtained by stacking the deep architecture and an RBF network. We show that such a model is much less vulnerable to adversarial examples than the original model, while its accuracy remains almost the same.
This paper is organized as follows. First, in Sect. 2 we explain how adversarial examples work and review related work. Then, Sect. 3 introduces the new architecture. Section 4 deals with the results of our experiments. Finally, Sect. 5 concludes our paper.
2 Adversarial Examples and Related Work
The adversarial examples were first introduced in [5]. The paper shows that having a trained network it is possible to arbitrarily change the network prediction by applying an imperceptible non-random perturbation to an input image. Such perturbations are found by optimizing the input to maximize the prediction error. The box-constrained Limited-memory Broyden-Fletcher-Goldfarb-Shanno algorithm (L-BFGS) is used for this optimization.
On some data sets, such as ImageNet, the adversarial examples are so close to the original examples that they are indistinguishable by human eye. In addition, the authors state that adversarial examples are relatively robust, and they generalize between neural networks with varied number of layers, activations, or trained on different subsets of the training data. In other words, if we use one neural network to generate a set of adversarial examples, these examples are also misclassified by another neural network even when it was trained with different hyperparameters, or when it was trained on a different subset of examples.
Paper [6] suggests that it is the linear behaviour in high-dimensional spaces that is sufficient to cause adversarial examples (for example, a linear classifier exhibits this behaviour, too). The authors propose a fast method of generating adversarial examples (adding small vector in the direction of the sign of the derivation).
Let us have a linear classifier and let x and \(\tilde{x} = x + \eta \) be input vectors. The classifier should assign x and \(\tilde{x}\) same classes as long as \( ||\eta ||_{\infty } \le \varepsilon \), where \(\varepsilon \) is the precision of features.
Consider the dot product between weight vector w and input vector \(\tilde{x}\):
Adding \(\eta \) to the input vector, the activation increases by \(w^{\top }\eta \). We can maximize this increase by \(\eta = \varepsilon sign(w)\). If n is the dimension and m is the average magnitude of w, the activation grows by \(\varepsilon mn\). Note that \(||\eta ||_{\infty }\) does not grow with n, but the change in activation caused by perturbation \(\eta \) does grow linearly. It is possible to make many infinitesimal changes to the input that add up to a large change of the activation. Therefore, a simple linear model can have adversarial examples if its input has sufficient dimensionality.
The above observation can be generalized to nonlinear models [6]. Let \(\theta \) be the parameters of a model, x an input, y the targets for x, and \(J(\theta , x, y)\) the cost function. If we linearize the cost function around the \(\theta \), we obtain an optimal perturbation: \( \eta = \varepsilon \;\text {sgn}(\bigtriangledown _x J(\theta , x, y)).\) This represents an efficient way of generating adversarial examples and it is referred to as the fast gradient sign method (FGSM). See Fig. 1 for adversarial images crafted by FGSM on MNIST data set [7] for CNN.
Other results of fooling deep and convolutional networks can be found in [8]. This paper studies the generation of images looking as noise or regular patterns by evolutionary algorithms. To generate regular patterns the authors use Compositional pattern-producing network (CPPN), that has similar structure as neural networks. It takes indexes (x, y) as an input and outputs pixel value. Nodes are functions like Gaussian, sinus, sigmoid, linear. The CPPNs are created by evolutionary algorithms, and the resulting images are regular patterns that are classified as desired images from the training set with high confidence.
In [9] another class of crafting algorithms is proposed, and in [10] a black-box strategy to adversarial attacks is described.
In our paper [11], we examine a vulnerability to adversarial examples throughout variety of machine learning methods. We propose a genetic algorithm for generating adversarial examples. Though the evolutionary search for adversarial examples is slower than techniques described in [5, 6], it enables us to obtain adversarial examples without the access to model’s weights. Thus, we have a unified approach for a wide range of machine learning models, including not only neural networks, but also support vector machine classifiers (SVMs), decision trees, and possibly others. The only thing this approach needs is the possibility to query the classifier to evaluate a given example. See Fig. 2 for adversarial images crafted by our genetic algorithm for CNN.
The question of how to make the neural networks robust to adversarial examples is dealt with in [12]. The authors tried several methods, from noise injection and Gaussian blur, using autoencoder, to method they call deep contractive network (that applies a regularization term penalizing large changes of activation in respect to change of input to the cost function). However, the methods cure the adversarial examples only to some extend.
Another attempt to prevent adversarial examples is proposed in [13], based on distillation, i.e. training another network based on outputs produced by target network.
3 Deep Networks with RBF Layers
RBF networks [14,15,16,17,18] are neural networks with one hidden layer of RBF units and a linear output layer.
By an RBF unit we mean a neuron with multiple real inputs \(\varvec{x}=(x_1,\ldots ,x_n)\) and one output y. Each unit is determined by n-dimensional vector \(\varvec{c}\) which is called centre. It can have additional parameter \(\beta > 0\) that determines its width.
The output y is computed as:
where \(\varphi :{\mathbb R}\rightarrow {\mathbb R}\) is suitable activation function, typically Gaussian \(\varphi (z)=e^{-z^2}\).
Thus, the network computes the following function \(\varvec{f}:{\mathbb R}^n\rightarrow {\mathbb R}^m\):
where \(w_{ji}\in {\mathbb R}\) and \(f_s\) is the output of the s-th output unit.
The history of RBF networks can be traced back to the 1980s, particularly to the study of interpolation problems in numerical analysis. It is where the radial basis functions were first introduced, in the solution of the real multivariate interpolation problem [19, 20].
The RBF networks benefit from a rich spectrum of learning possibilities. The study of these algorithms together with experimental results was also published in our papers [21, 22].
With the boom of deep learning the popularity of RBF networks vanishes. However, we show that they can bring advantages when combined with deep neural networks.
We introduce new deep architecture that is defined as a concatenation of a feedforward deep neural network and an RBF network (see Fig. 3). Let us have a deep neural network DN that realizes a function \(f_{DN}: {\mathbb R}^n \rightarrow {\mathbb R}^m\) and a RBF network RBF that realizes a function \(f_{RBF}: {\mathbb R}^m \rightarrow {\mathbb R}^m\). Then feeding the outputs of DN to inputs of RBF we get a network implementing function: \(f: {\mathbb R}^n \rightarrow {\mathbb R}^m\), where
For classification tasks we can add softmax activation function to the output layer of RBF network.
The training procedure is the following:
-
1.
train the DN by any appropriate learning algorithm
-
2.
set the centers of RBF randomly, drawn from uniform distribution on (0, 1.0)
-
3.
set the parameters \(\beta \) to the constant value
-
4.
initialize the weights of RBF output layer to random small values
-
5.
retrain the whole network DNRBF (by back propagation).
While the DN part of the network is already trained, it is usually sufficient to train the whole stacked network only for few epochs.
4 Experimental Results
For our experiments we use the FGSM implemented in Cleverhans library [23]. To implement deep neural networks we use Keras [24] and our RBF layer implementation [25]. The scripts used for experiments can be found at [26].
We have two target architectures—MLP (two dense hidden layers with 512 ReLU units each, dense output layer of 10 softmax units) and CNN (two convolutional layers with 32 3\(\,\times \,\)3 filters, ReLU activation, 2\(\,\times \,\)2 max pooling layer, dense layer with 128 ReLU units, dense output layer of 10 softmax units).
These two architectures were trained 30 times by RMSProp for 20 and 12 epochs for MLP and CNN respectively. We obtained 98.35% average accuracy for MLP and 98.97% average accuracy for CNN on test data, but only 1.95% (MLP) and 8.49% (CNN) on adversarial data drawn by FGSM from test set.
To each of the 30 trained networks we added the RBF network and retrained the whole new networks for 3 epochs. We found that the results depend on the parameters \(\beta \) of the Gaussians, therefore we tried several initial setups. The best results were obtained with initial \(\beta \) 2.0, and on adversarial data they were 89.21% for MLPRBF and 74.57% for CNNRBF. The complete results can be found in Tables 1, 2 and Figs. 4, 5. It shows that adding RBF network to the deep network may significantly decrease the vulnerability to adversarial examples.
In addition, Table 3 lists the average accuracies on adversarial data crafted with FGSM with different values of \(\epsilon \).
5 Conclusion
In this paper we dealt with the problem of adversarial examples. We have proposed the new deep architecture that is obtained by stacking a feedforward deep neural network and an RBF network. Only a few learning epochs for the whole stacked network are needed to retrain, and to obtain the accuracy close to the accuracy of original deep neural network. We have shown that the new stacked network is much less vulnerable to adversarial examples than the original one.
References
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Hinton, G.E.: Learning multiple layers of representation. Trends Cognit. Sci. 11, 428–434 (2007)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Bartlett, P., Pereira, F., Burges, C., Bottou, L., Weinberger, K. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1106–1114. Neural Information Processing Systems Foundation (2012)
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks (2013). arXiv:1312.6199
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples (2014). arXiv:1412.6572
LeCun, Y., Cortes, C.: The MNIST database of handwritten digits (2012)
Nguyen, A.M., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. CoRR abs/1412.1897 (2014)
Papernot, N., McDaniel, P.D., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. CoRR abs/1511.07528 (2015)
Papernot, N., McDaniel, P.D., Goodfellow, I.J., Jha, S., Celik, Z.B., Swami, A.: Practical black-box attacks against deep learning systems using adversarial examples. CoRR abs/1602.02697 (2016)
Vidnerová, P., Neruda, R.: Evolutionary generation of adversarial examples for deep and shallow machine learning models, pp. 43:1–43:7 (2016)
Gu, S., Rigazio, L.: Towards deep neural network architectures robust to adversarial examples. CoRR abs/1412.5068 (2014)
Papernot, N., McDaniel, P.D., Wu, X., Jha, S., Swami, A.: Distillation as a defense to adversarial perturbations against deep neural networks. CoRR abs/1511.04508 (2015)
Moody, J., Darken, C.: Fast learning in networks of locally-tuned processing units. Neural Comput. 1, 289–303 (1989)
Poggio, T., Girosi, F.: A theory of networks for approximation and learning. Technical report, Cambridge, MA, USA (1989) A. I. Memo No. 1140, C.B.I.P. Paper No. 31
Broomhead, D., Lowe, D.: Multivariable functional interpolation and adaptive networks. Complex Syst. 2, 321–355 (1988)
Peng, J.X., Li, K., Irwin, G.W.: A novel continuous forward algorithm for RBF neural modelling. IEEE Trans. Autom. Control 52(1), 117–122 (2007)
Fu, X., Wang, L.: Data dimensionality reduction with application to simplifying RBF network structure and improving classification performance. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 33(3), 399–409 (2003)
Powel, M.: Radial basis functions for multivariable interpolation: a review. In: IMA Conference on Algorithms for the Approximation of Functions and Data, RMCS, Shrivenham, England, pp. 143–167 (1985)
Light, W.: Some aspects of radial basis function approximation. In: Approximation Theory, Spline Functions and Applications, pp. 163–190. Kluwer Academic Publishers, Dordrecht (1992)
Neruda, R., Kudová, P.: Learning methods for radial basis functions networks. Future Gener. Comput. Syst. 21, 1131–1142 (2005)
Neruda, R., Kudová, P.: Hybrid learning of RBF networks. Neural Netw. World 12(6), 573–585 (2002)
Papernot, N., et al.: cleverhans v2.0.0: an adversarial machine learning library. arXiv preprint arXiv:1610.00768 (2017)
Chollet, F.: Keras (2015). https://github.com/fchollet/keras
Vidnerová, P.: RBF for keras (2017). https://github.com/PetraVidnerova/rbf_keras
Vidnerová, P.: Experiments with deep RBF networks (2017). https://github.com/PetraVidnerova/rbf_tests
Acknowledgments
This work was partially supported by the Czech Grant Agency grant GA18-23827S and institutional support of the Institute of Computer Science RVO 67985807.
Access to computing and storage facilities owned by parties and projects contributing to the National Grid Infrastructure MetaCentrum provided under the programme “Projects of Large Research, Development, and Innovations Infrastructures” (CESNET LM2015042), is greatly appreciated.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Vidnerová, P., Neruda, R. (2018). Deep Networks with RBF Layers to Prevent Adversarial Examples. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2018. Lecture Notes in Computer Science(), vol 10841. Springer, Cham. https://doi.org/10.1007/978-3-319-91253-0_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-91253-0_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91252-3
Online ISBN: 978-3-319-91253-0
eBook Packages: Computer ScienceComputer Science (R0)