Skip to main content
Log in

Deep learning with ExtendeD Exponential Linear Unit (DELU)

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Activation functions are crucial parts of artificial neural networks. From the first perceptron created artificially up to today, many functions are proposed. Some of them are currently in common use, such as Rectified Linear Unit (ReLU) and Exponential Linear Unit (ELU) and other ReLU variants. In this article we propose a novel activation function, called ExtendeD Exponential Linear Unit (DELU). After its introduction and presenting its basic properties, by making various simulations with different datasets and architectures, we show that it may perform better than other activation functions in certain cases. While also inheriting most of the good properties of ReLU and ELU, DELU offers an increase of success in comparison with them by slowing the alignment of neurons in early stages of training process. In experiments, DELU performed better than other activation functions in general, for Fashion MNIST, CIFAR-10 and CIFAR-100 classification tasks with different sized Residual Neural Networks (ResNet). Specifically, DELU managed to reduce the error rate by sufficiently high confidence levels in CIFAR datasets in comparison with ReLU and ELU networks. In addition, DELU is compared in an image segmentation example as well. Also, compatibility of DELU is tested with different initializations, and statistical methods are employed to verify these success rates by using Z-score analysis, which may be considered as a different view of success assessment in neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

Datasets used during this study are open to access in publicly available sources, which are given in the reference list as [25, 26, 32].

References

  1. Ding B, Qian H, Zhou J (2018) Activation functions and their characteristics in deep neural networks. Chin Control Decis Conf 2018:1836–1841. https://doi.org/10.1109/CCDC.2018.8407425

    Article  Google Scholar 

  2. Alhassan AM, Zainon WMNW (2021) Brain tumor classification in magnetic resonance image using hard swish-based RELU activation function-convolutional neural network. Neural Comput Appl 33:9075–9087. https://doi.org/10.1007/s00521-020-05671-3

    Article  Google Scholar 

  3. Çatalbaş B (2022) Control and system identification of legged locomotion with recurrent neural networks (Doctoral Dissertation). Retrieved from http://repository.bilkent.edu.tr/handle/11693/90921

  4. Haykin S (1999) Neural networks: a comprehensive foundation. Prentice Hall, New Jersey, NY

    MATH  Google Scholar 

  5. Dubey SR, Singh SK, Chaudhuri BB (2022) Activation functions in deep learning: a comprehensive survey and benchmark. Neurocomputing 503:92–108. https://doi.org/10.1016/j.neucom.2022.06.111

    Article  Google Scholar 

  6. Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814

  7. Williams A (2017) The art of building neural networks. TheNewStack. https://thenewstack.io/art-building-neural-networks/

  8. Zheng Q, Yang M, Yang J, Zhang Q, Zhang X (2018) Improvement of generalization ability of deep CNN via implicit regularization in two-stage training process. IEEE Access 6:15844–15869. https://doi.org/10.1109/ACCESS.2018.2810849

    Article  Google Scholar 

  9. Li H, Zeng N, Wu P, Clawson K (2022) Cov-Net: a computer aided diagnosis method for recognizing COVID-19 from chest X-ray images via machine vision. Expert Syst with Appl 207:118029. https://doi.org/10.1016/j.eswa.2022.118029

    Article  Google Scholar 

  10. Zhang K, Yang X, Zang J, Li Z (2021) FeLU: a fractional exponential linear unit. In: 2021 33rd Chinese Control and Decision Conference (CCDC), pp 3812–3817. https://doi.org/10.1109/CCDC52312.2021.9601925

  11. Apicella A, Donnarumma F, Isgrò F, Prevete R (2021) A survey on modern trainable activation functions. Neural Netw 138:14–32. https://doi.org/10.1016/j.neunet.2021.01.026

    Article  Google Scholar 

  12. Clevert D-A, Unterthiner T, Hochreiter S (2015) Fast and accurate deep learning by exponential linear units (ELUs). In: The International Conference on Learning Representations (ICLR), pp 1–14. https://doi.org/10.48550/arXiv.1511.07289

  13. Qiumei Z, Dan T, Fenghua W (2019) Improved convolutional neural network based on fast exponentially linear unit activation function. IEEE Access 7:151359–151367. https://doi.org/10.1109/ACCESS.2019.2948112

    Article  Google Scholar 

  14. Adem K (2021) P + FELU: flexible and trainable fast exponential linear unit for deep learning architectures. Neural Comput Appl 34:21729–21740. https://doi.org/10.1007/s00521-022-07625-3

    Article  Google Scholar 

  15. Sakketou F, Ampazis N (2019) On the invariance of the SELU activation function on algorithm and hyperparameter selection in neural network recommenders. In: IFIP International Conference on Artificial Intelligence Applications and Innovations, Springer, Cham, pp 673–685. https://doi.org/10.1007/978-3-030-19823-7_56

  16. Ramachandran P, Zoph B, Le QV (2017) Searching for activation functions. arXiv preprint arXiv:1710.05941v2. https://doi.org/10.48550/arXiv.1710.05941

  17. Zhou Y, Li D, Hou D, Kung SY (2021) Shape autotuning activation function. Expert Syst with Appl 171:114534. https://doi.org/10.1016/j.eswa.2020.114534

    Article  Google Scholar 

  18. Alkhouly AA, Mohammed A, Hefny HA (2021) Improving the performance of deep neural networks using two proposed activation functions. IEEE Access 9:82249–82271. https://doi.org/10.1109/ACCESS.2021.3085855

    Article  Google Scholar 

  19. Li K, Fan C, Li Y, Wu Q, Ming Y (2018) Improving deep neural network with multiple parametric exponential linear units. Neurocomputing 301:11–24. https://doi.org/10.1016/j.neucom.2018.01.084

    Article  Google Scholar 

  20. Github (2018) Code for improving deep neural network with multiple parametric exponential linear units. Github. Retrieved from https://github.com/Coldmooon/Code-for-MPELU

  21. Lu L, Shin Y, Su Y, Karniadakis G (2020) Dying ReLU and initialization: theory and numerical examples. arXiv preprint arXiv:1903.06733v3. https://doi.org/10.48550/arXiv.1903.06733

  22. Billingsley P (1995) Probability and measure, 3rd edn. John Wiley & Sons, New York, NY

    MATH  Google Scholar 

  23. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep networks training by reducing internal covariate shift. In: International Conference on Machine Learning, pp 448–456, PMLR. https://doi.org/10.48550/arXiv.1502.03167

  24. Alcaide E (2018) E-swish: adjusting activations to different network depths. arXiv: 1801.07145. https://doi.org/10.48550/arXiv.1801.07145

  25. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Retrieved from www.cs.utoronto.ca/\(^\sim \)kriz/learning-features-2009-TR.pdf

  26. Xiao H, Rasul K, Vollgraf R (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747. https://doi.org/10.48550/arXiv.1708.07747

  27. Shan S, Willson E, Wang B, Li B, Zheng B, Zhao BY (2019) Gotta catch ’em all: using concealed trapdoor to detect adversarial attacks on neural networks. In: Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pp 1–14

  28. Ruiz P (2018) Understanding and visualizing ResNets. Towards Data Science. Retrieved from https://towardsdatascience.com/understanding-and-visualizing-resnets-442284831be8

  29. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision, Springer, Cham, pp 630–645. https://doi.org/10.1007/978-3-319-46493-0_38

  30. Keras (n.d.) Trains a ResNet on the CIFAR10 dataset. Keras. Retrieved from https://keras.io/zh/examples/cifar10_resnet/

  31. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034. https://doi.org/10.1109/ICCV.2015.123

  32. Parkhi O, Vedaldi A, Zisserman A, Jawahar CV (2012) Cats and dogs. In: IEEE Conference on Computer Vision and Pattern Recognition (2th ed.). Retrieved from www.robots.ox.ac.uk/\(^\sim \)vgg/data/pets/. https://doi.org/10.1109/CVPR.2012.6248092

  33. Chollet F (2020) Image segmentation with a U-Net-like architecture. Keras. Retrieved from keras.io/examples/vision/oxford_pets_image_segmentation

  34. He Y, Zhang X, Sun J (2017) Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE international conference on computer vision. https://doi.org/10.48550/arXiv.1707.06168

  35. Salakhutdinov R, Larochelle H (2010) Efficient learning of deep boltzmann machines. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics

  36. Papoulis A (1985) Probability, random variables, and stochastic processes, 2nd edn. McGraw-Hill, New York, NY

    MATH  Google Scholar 

  37. Epanechnikov VA (1969) Non-parametric estimation of a multivariate probability density. Theory Probab Appl 14(1):153–158

    Article  MathSciNet  Google Scholar 

  38. The Math Works Inc. (2021) Kernel Distribution. MathWorks. https://www.mathworks.com/help/stats/kernel-distribution.html

  39. Poor HV (2013) An introduction to signal detection and estimation. Springer Science & Business Media, Berlin

    Google Scholar 

  40. DeVore GR (2017) Computing the Z score and centiles for cross-sectional analysis: a practical approach. J Ultrasound Med 36(3):459–473

    Article  Google Scholar 

  41. Urolagin S, Sharma N, Datta TK (2021) A combined architecture of multivariate LSTM with Mahalanobis and Z-Score transformations for oil price forecasting. Energy 231:120963. https://doi.org/10.1016/j.energy.2021.120963

    Article  Google Scholar 

  42. Adler K, Gaggero G, Maimaitijiang Y (2010) Distinguishability in EIT using a hypothesis-testing model. J Phys: Conf Ser 224(1):12056. https://doi.org/10.1088/1742-6596/224/1/012056

    Article  Google Scholar 

  43. LaMorte WW (2017) Hypothesis testing: upper-, lower, and two tailed tests. Boston University School of Public Health. Retrieved from sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_hypothesistest-means-proportions/bs704_hypothesistest-means-proportions3.html

Download references

Acknowledgements

This work was supported by the Scientific and Technological Research Council of Türkiye (TÜBİTAK). The authors appreciate the financial support of the TÜBİTAK. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the P6000 Quadro GPU used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Burak Çatalbaş.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Çatalbaş, B., Morgül, Ö. Deep learning with ExtendeD Exponential Linear Unit (DELU). Neural Comput & Applic 35, 22705–22724 (2023). https://doi.org/10.1007/s00521-023-08932-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08932-z

Keywords

Navigation