Multikernel Activation Functions: Formulation and a Case Study

  • Simone ScardapaneEmail author
  • Elena Nieddu
  • Donatella Firmani
  • Paolo Merialdo
Conference paper
Part of the Proceedings of the International Neural Networks Society book series (INNS, volume 1)


The design of activation functions is a growing research area in the field of neural networks. In particular, instead of using fixed point-wise functions (e.g., the rectified linear unit), several authors have proposed ways of learning these functions directly from the data in a non-parametric fashion. In this paper we focus on the kernel activation function (KAF), a recently proposed framework wherein each function is modeled as a one-dimensional kernel model, whose weights are adapted through standard backpropagation-based optimization. One drawback of KAFs is the need to select a single kernel function and its eventual hyper-parameters. To partially overcome this problem, we motivate an extension of the KAF model, in which multiple kernels are linearly combined at every neuron, inspired by the literature on multiple kernel learning. We provide an application of the resulting multi-KAF on a realistic use case, specifically handwritten Latin OCR, on a large dataset collected in the context of the ‘In Codice Ratio’ project. Results show that multi-KAFs can improve the accuracy of the convolutional networks previously developed for the task, with faster convergence, even with a smaller number of overall parameters.


Activation function Multikernel OCR Latin 


  1. 1.
    Aiolli, F., Donini, M.: EasyMKL: a scalable multiple kernel learning algorithm. Neurocomputing 169, 215–224 (2015)CrossRefGoogle Scholar
  2. 2.
    Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). In: Proceedings of the 2016 International Conference on Learning Representations, ICLR (2016)Google Scholar
  3. 3.
    Eisenach, C., Wang, Z., Liu, H.: Nonparametrically learning activation functions in deep neural nets. In: 5th International Conference for Learning Representations (Workshop Track) (2017)Google Scholar
  4. 4.
    Firmani, D., Maiorino, M., Merialdo, P., Nieddu, E.: Towards knowledge discovery from the Vatican Secret Archives. In Codice Ratio-episode 1: machine transcription of the manuscripts. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 263–272. ACM (2018)Google Scholar
  5. 5.
    Firmani, D., Merialdo, P., Nieddu, E., Scardapane, S.: In Codice Ratio: OCR of handwritten Latin documents using deep convolutional networks. In: 11th International Workshop on Artificial Intelligence for Cultural Heritage, AI*CH 2017, pp. 9–16 (2017)Google Scholar
  6. 6.
    Genton, M.G.: Classes of kernels for machine learning: a statistics perspective. J. Mach. Learn. Res. 2(Dec), 299–312 (2001)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, AISTATS, p. 275 (2011)Google Scholar
  8. 8.
    Gönen, M., Alpaydın, E.: Multiple kernel learning algorithms. J. Mach. Learn. Res. 12(Jul), 2211–2268 (2011)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: Proceedings of the 30th International Conference on Machine Learning, ICML (2013)Google Scholar
  10. 10.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)zbMATHGoogle Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, ICCV, pp. 1026–1034 (2015)Google Scholar
  12. 12.
    Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning, ICML, pp. 448–456 (2015)Google Scholar
  13. 13.
    Jin, X., Xu, C., Feng, J., Wei, Y., Xiong, J., Yan, S.: Deep learning with S-shaped rectified linear activation units. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (2016)Google Scholar
  14. 14.
    Liu, W., Principe, J.C., Haykin, S.: Kernel Adaptive Filtering: A Comprehensive Introduction. Wiley, Hoboken (2011)Google Scholar
  15. 15.
    Oneto, L., Navarin, N., Donini, M., Ridella, S., Sperduti, A., Aiolli, F., Anguita, D.: Learning with kernels: a local rademacher complexity-based analysis with application to graph kernels. IEEE Trans. Neural Netw. Learn. Syst. 29(10), 4660–4671 (2017)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. arXiv preprint arXiv:1710.05941 (2017)
  17. 17.
    Scardapane, S., Van Vaerenbergh, S., Totaro, S., Uncini, A.: Kafnets: kernel-based non-parametric activation functions for neural networks. Neural Netw. (2018, in press)Google Scholar
  18. 18.
    Scardapane, S., Van Vaerenbergh, S., Hussain, A., Uncini, A.: Complex-valued neural networks with non-parametric activation functions. IEEE Trans. Emerg. Top. Comput. Intell. (2018, in press)Google Scholar
  19. 19.
    Siniscalchi, S.M., Salerno, V.M.: Adaptation to new microphones using artificial neural networks with trainable activation functions. IEEE Trans. Neural Netw. Learn. Syst. 28(8), 1959–1965 (2017)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Zhang, X., Trmal, J., Povey, D., Khudanpur, S.: Improving deep neural network acoustic models using generalized maxout networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 215–219. IEEE (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.DIET DepartmentSapienza University of RomeRomeItaly
  2. 2.Department of EngineeringRoma Tre UniversityRomeItaly

Personalised recommendations