Soft Dropout Method in Training of Contextual Neural Networks

  • Krzysztof Wołk
  • Rafał PalakEmail author
  • Erik Dawid Burnell
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12034)


Various regularization techniques were developed to prevent many adverse effects that may appear during the training of contextual and non-contextual neural networks. The problems include e.g.: overfitting, vanishing of the gradient and too high increase in weight values. A commonly used solution that limits many of those is the dropout. The goal of this paper is to propose and analyze a new type of dropout - Soft Dropout. Unlike traditional dropout regularization, in Soft Dropout neurons are excluded only partially, what is regulated by additional, continuous muting factor. This change can help to generate classification models with lower overfitting. The paper present results suggesting that Soft Dropout can help to generate classification models with lower overfitting than standard dropout technique. Experiments are performed for selected benchmark and real-life datasets with MLP and Contextual Neural Networks.


Dropout Muting factor Overfitting 


  1. 1.
    Chen, S., Zhang, S., Shang, J., Chen, B., Zheng, N.: Brain-inspired cognitive model with attention for self-driving cars. IEEE Trans. Cogn. Dev. Syst. 11(1), 13–25 (2019)CrossRefGoogle Scholar
  2. 2.
    Guest, D., Cranmer, K., Whiteson, D.: Deep learning and its application to LHC physics. Annu. Rev. Nucl. Part. Sci. 68, 1–22 (2018)CrossRefGoogle Scholar
  3. 3.
    Suleymanova, I., et al.: A deep convolutional neural network approach for astrocyte detection. Sci. Rep. 8(12878), 1–7 (2018)Google Scholar
  4. 4.
    Liu, L., Zheng, Y., Tang, D., Yuan, Y., Fan, C., Zhou, K.: Automatic skin binding for production characters with deep graph networks. ACM Trans. on Graphics (SIGGRAPH) 38(4), 1–12 (2019). Article 114Google Scholar
  5. 5.
    Gao, D., Li, X., Dong, Y., Peers, P., Xu, K., Tong, X.: Deep inverse rendering for high-resolution SVBRDF estimation from an arbitrary number of images. ACM Trans. Graphics (SIGGRAPH) 38(4), 1–15 (2019). article 134CrossRefGoogle Scholar
  6. 6.
    Tsai, Y.C., et al.: FineNet: a joint convolutional and recurrent neural network model to forecast and recommend anomalous financial items. In: Proceedings of the 13th ACM Conference on Recommender Systems, RecSys 2019, pp. 536–537. ACM, New York (2019)Google Scholar
  7. 7.
    Dozono, H., Niina, G., Araki, S.: Convolutional self organizing map. In: 2016 IEEE International Conference on Computational Science and Computational Intelligence (CSCI), pp. 767–771. IEEE (2016)Google Scholar
  8. 8.
    Gong, K., et al.: Iterative PET image reconstruction using convolutional neural network representation. IEEE Trans. Med. Imaging 38(3), 675–685 (2019)CrossRefGoogle Scholar
  9. 9.
    Huang, X., Tan, H., Lin, G., Tian, Y.: A LSTM-based bidirectional translation model for optimizing rare words and terminologies. In: 2018 IEEE International Conference on Artificial Intelligence and Big Data (ICAIBD), China, pp. 5077–5086. IEEE (2018)Google Scholar
  10. 10.
    Athiwaratkun, B., Stokes, J.W.: Malware classification with LSTM and GRU language models and a character-level CNN. In: Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), USA, pp. 2482–2486. IEEE (2017)Google Scholar
  11. 11.
    Higgins, I., et al.: β-VAE: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations, ICLR 2017, vol 2, no. 5, pp. 1–22 (2017)Google Scholar
  12. 12.
    Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representations, ICLR 2018, pp. 1–26 (2018)Google Scholar
  13. 13.
    Huk, M.: Backpropagation generalized delta rule for the selective attention Sigma-if artificial neural network. Int. J. App. Math. Comput. Sci. 22, 449–459 (2012)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Huk, M.: Notes on the generalized backpropagation algorithm for contextual neural networks with conditional aggregation functions. J. Intell. Fuzzy Syst. 32, 1365–1376 (2017)CrossRefGoogle Scholar
  15. 15.
    Huk, M.: Learning distributed selective attention strategies with the Sigma-if neural net-work. In: Akbar, M., Hussain, D. (eds.) Advances in Computer Science and IT, pp. 209–232. InTech, Vukovar (2009)Google Scholar
  16. 16.
    Huk, M.: Manifestation of selective attention in Sigma-if neural network. In: 2nd International Symposium Advances in Artificial Intelligence and Applications, International Multiconference on Computer Scientists and Information Technology, IMCSIT/AAIA 2007, vol. 2, pp. 225–236 (2007)Google Scholar
  17. 17.
    Huk, M.: Sigma-if neural network as the use of selective attention technique in classification and knowledge discovery problems solving. Ann. UMCS Sect. AI - Informatica 4(2), 121–131 (2006)Google Scholar
  18. 18.
    Szczepanik, M., Jóźwiak, I.: Data management for fingerprint recognition algorithm based on characteristic points’ groups. Found. Comput. Decis. Sci. 38(2), 123–130 (2013). New Trends in Databases and Information SystemsCrossRefGoogle Scholar
  19. 19.
    Szczepanik, M., Jóźwiak, I.: Fingerprint recognition based on minutes groups using directing attention algorithms. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) ICAISC 2012. LNCS (LNAI), vol. 7268, pp. 347–354. Springer, Heidelberg (2012). Scholar
  20. 20.
    Huk, M., Pietraszko, J.: Contextual neural-network based spectrum prediction for cognitive radio. In: 4th International Conference on Future Generation Communication Technology (FGCT 2015), pp. 1–5. IEEE Computer Society, London (2015)Google Scholar
  21. 21.
    Huk, M.: Non-uniform initialization of inputs groupings in contextual neural networks. In: Nguyen, N.T., Gaol, F.L., Hong, T.-P., Trawiński, B. (eds.) ACIIDS 2019. LNCS (LNAI), vol. 11432, pp. 420–428. Springer, Cham (2019). Scholar
  22. 22.
    Huk, M.: Training contextual neural networks with rectifier activation functions: role and adoption of sorting methods. J. Intell. Fuzzy Syst. 38, 1–10 (2019)Google Scholar
  23. 23.
    Huk, M.: Weights ordering during training of contextual neural networks with generalized error backpropagation: importance and selection of sorting algorithms. In: Nguyen, N.T., Hoang, D.H., Hong, T.-P., Pham, H., Trawiński, B. (eds.) ACIIDS 2018. LNCS (LNAI), vol. 10752, pp. 200–211. Springer, Cham (2018). Scholar
  24. 24.
    Huk, M.: Context-related data processing with artificial neural networks for higher reliability of telerehabilitation systems. In: 17th International Conference on E-health Networking, Application & Services (HealthCom), pp. 217–221. IEEE Computer Society, Boston (2015)Google Scholar
  25. 25.
    Huk, M., Kwiatkowski, J., Konieczny, D., Kędziora, M., Mizera-Pietraszko, J.: Context-sensitive text mining with fitness leveling genetic algorithm. In: 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF), Gdynia, Poland, 2015, pp. 1–6. Electronic Publication (2015). ISBN 978–1-4799-8321-6Google Scholar
  26. 26.
    Huk, M., Kwaśnicka, H.: The concept and properties of sigma-if neural network. In: Ribeiro, B., Albrecht, R.F., Dobnikar, A., Pearson, D.W., Steele, N.C. (eds.) Adaptive and Natural Computing Algorithms, ICANNGA 2005, pp. 13–17. Springer, Vienna (2005). Computer ScienceCrossRefGoogle Scholar
  27. 27.
    Privitera, C.M., Azzariti, M., Stark, L.W.: Locating regions-of-interest for the Mars Rover expedition. Int. J. Remote Sens. 21, 3327–3347 (2000)CrossRefGoogle Scholar
  28. 28.
    Glosser, C., Piermarocchi, C., Shanker, B.: Analysis of dense quantum dot systems using a self-consistent Maxwell-Bloch framework. In: Proceedings of 2016 IEEE International Symposium on Antennas and Propagation (USNC-URSI), Puerto Rico, pp. 1323–1324. IEEE (2016)Google Scholar
  29. 29.
    Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Ko, B., Kim, H.G., Choi, H. J.: Controlled dropout: a different dropout for improving training speed on deep neural network. In: Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Canada. IEEE (2018)Google Scholar
  31. 31.
    ElAdel, A., Ejbali, R., Zaied, M., Ben Amar, C.: Fast deep neural network based on intelligent dropout and layer skipping. In: Proceedings: 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, USA (2017)Google Scholar
  32. 32.
    Salehinejad, H., Valaee, S.: Ising-dropout: a regularization method for training and compression of deep neural networks. In: Proceedings: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom (2019)Google Scholar
  33. 33.
    Guo, J., Gould, S.: Depth dropout: efficient training of residual convolutional neural networks. In: 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Gold Coast, Australia. IEEE (2016)Google Scholar
  34. 34.
    UCI Machine Learning Repository.
  35. 35.
    Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)CrossRefGoogle Scholar
  36. 36.

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.The Wroclaw Institute of Spatial Information and Artificial IntelligenceWroclawPoland
  2. 2.Wroclaw University of Science and TechnologyWroclawPoland
  3. 3.Science Applications International CorporationRestonUSA

Personalised recommendations