Abstract
Convolutional neural networks (CNNs) have become an essential tool for solving many machine vision and machine learning problems. A major element of these networks is the convolution operator which essentially computes the inner product between a weight vector and the vectorized image patches extracted by sliding a window in the image planes of the previous layer. In this paper, we propose two classes of surrogate functions for the inner product operation inherent in the convolution operator and so attain two generalizations of the convolution operator. The first one is based on the class of positive definite kernel functions where their application is justified by the kernel trick. The second one is based on the class of similarity measures defined according to some distance function. We justify this by tracing back to the basic idea behind the neocognitron which is the ancestor of CNNs. Both of these methods are then further generalized by allowing a monotonically increasing function (possibly depending on the weight vector) to be applied subsequently. Like any trainable parameter in a neural network, the template pattern and the parameters of the kernel/distance function are trained with the back-propagation algorithm. As an aside, we use the proposed framework to justify the use of sine activation function in CNNs. Additionally, we discovered a family of generalized convolution operators which is based on the convex combination of the dot-product and the negative squared Euclidean distance functions. Our experiments on the MNIST dataset show that the performance of ordinary CNNs can be achieved by generalized CNNs based on weighted L1/L2 distances, proving the applicability of the proposed generalization of the convolutional neural networks.
Similar content being viewed by others
Notes
To understand these details at the level of code, the reader is referred to the implementation of ConvolutionLayer in Caffe.
For example, in our experiments on the MNIST dataset, we have 12 planes in the first convolution layer which, considering a window of size 5, induces a dimensionality of \(12\times 5\times 5=300\) on the input of the second convolution layer.
Initialization algorithms usually normalize the variance to 1 [11]. However, we experimentally measured the variance at the output of convolution layers in a network initialized by the Xavier method [6] on the MNIST dataset and found that the standard deviation of the first layer is approximately 0.5.
See https://github.com/yihui-he/resnet-cifar10-caffe for the details about the resnet-20 network. Resnet networks were introduced by He et al. [7].
References
Chandar S, Khapra MM, Larochelle H, Ravindran B (2016) Correlational neural networks. Neural Comput 28(2):257–285
Fletcher G, Hinde C (1994) Learning the activation function for the neurons in neural networks. In: ICANN94. Springer, pp 611–614
Fukushima K (1975) Cognitron: A self-organizing multilayered neural network. Biol Cybern 20(3–4):121–136
Fukushima K (1980) Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36(4):193–202
Fukushima K (1988) Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural Netw 1(2):119–130
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. AISTATS 9:249–256
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of The 32nd international conference on machine learning. pp 448–456
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, MM’14, Orlando, Florida, USA. ACM, New York, NY, pp 675–678. https://doi.org/10.1145/2647868.2654889
Krähenbühl P, Doersch C, Donahue J, Darrell T (2016) Data-dependent initializations of convolutional neural networks. In: International conference on learning representations
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
LeCun Y, Bottou L, Bengio Y, Haffner P (1998a) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
LeCun Y, Bottou L, Orr GB, Müller KR (1998b) Efficient backprop. In: Neural networks: tricks of the trade. pp 9–50
Li P (2016) Two classes of linear equations of discrete convolution type with harmonic singular operators. Complex Var Elliptic Equ 61(1):67–75
Li P (2017a) Generalized convolution-type singular integral equations. Appl Math Comput 311:314–323
Li P (2017b) Some classes of singular integral equations of convolution type in the class of exponentially increasing functions. J Inequal Appl 2017(1):307
Li P, Ren G (2016) Some classes of equations of discrete type with harmonic singular operator and convolution. Appl Math Comput 284:185–194
Lin M, Chen Q, Yan S (2014) Network in network. In: International conference on learning representations
Mairal J (2016) End-to-end kernel learning with supervised convolutional kernel networks. In: Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Advances in neural information processing systems 29. Curran Associates Inc., Red Hook, pp 1399–1407
Mairal J, Koniusz P, Harchaoui Z, Schmid C (2014) Convolutional kernel networks. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems. pp 2627–2635. https://papers.nips.cc/book/advances-inneural-information-processing-systems-27-2014
Mishkin D, Matas J (2016) All you need is a good init. In: International conference on learning representations
Nakagawa M (1995) An artificial neuron model with a periodic activation function. J Phys Soc Jpn 64(3):1023–1031
Nalaie K, Ghiasi-Shirazi K, Akbarzadeh-T MR (2017) Efficient implementation of a generalized convolutional neural networks based on weighted euclidean distance. In: 2017 7th international conference on computer and knowledge engineering (ICCKE). pp 211–216. https://doi.org/10.1109/ICCKE.2017.8167877
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252
Schölkopf B, Smola A (2002) Learning with kernels- support vector machines, regularization, optimization and beyond. MIT Press, Cambridge
Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T (2007) Robust object recognition with cortex-like mechanisms. IEEE Trans Pattern Anal Mach Intell 29(3):411–426
Sopena JM, Romero E, Alquezar R (1999) Neural networks with periodic and monotonic activation functions: a comparative study in classification problems. In: 9th international conference on artificial neural networks: ICANN ’99, IET. pp 323–328
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1–9
Williams D, Hinton G (1986) Learning representations by back-propagating errors. Nature 323:533–536
Acknowledgements
The author wishes to express appreciation to Research Deputy of Ferdowsi University of Mashhad for supporting this project by Grant No.: 2/43037. The author also thanks the anonymous reviewers and his fellows Ahad Harati and Ehsan Fazl-Ersi for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ghiasi-Shirazi, K. Generalizing the Convolution Operator in Convolutional Neural Networks. Neural Process Lett 50, 2627–2646 (2019). https://doi.org/10.1007/s11063-019-10043-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-019-10043-7