Skip to main content
Log in

Contradiction neutralization for interpreting multi-layered neural networks

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The present paper aims to propose a new method for neutralizing contradictions in neural networks. Neural networks exhibit numerous contradictions in the form of contrasts, differences, and errors, making it extremely challenging to find a compromise between them. In this context, neutralization is introduced not to resolve these contradictions, but to weaken them by transforming them into more manageable and concrete forms. In this paper, contradictions are neutralized or weakened through four neutralization methods: comprehensive, nullified, compressive, and collective. Comprehensive neutralization involves increasing the neutrality of all components in a neural network. Nullified neutralization is employed to weaken contradictions among different computational and optimization procedures. Compressive neutralization aims to simplify multi-layered neural networks while preserving the original internal information as much as possible. Collective neutralization is achieved by considering as many final networks as possible under different conditions, inputs, learning steps, and so on. The proposed method was applied to two data sets, one of which consisted of irregular forms resulting from natural language processing. The experimental results demonstrate that comprehensive neutralization could enhance the neutrality of all components and represent features across a broader range of components, thereby improving generalization. Nullified neutralization enabled a compromise between neutrality maximization and error minimization. Through compressive and collective neutralization of a large number of compressed weights, it became possible to interpret compressed and collective weights. In particular, inputs that were considered relatively unimportant by conventional methods emerged as highly significant. Finally, these results were compared with those obtained in the field of the human-centered approach to provide a clearer understanding of the significance of contradiction resolution, applied to neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Data availability

The first data set was taken from a homepage.

http://pub.nikkan.co.jp/tahennryou/tahennryou.html

Note that this page has been unavailable for some reason.

The second data set could be obtained from a homepage.

https://www.asakura.co.jp/detail.php?

book$_$code = 12261).

For further information, please contact the corresponding author.

References

  1. Shen Z, Cui P, Zhang T, Kunag K (2020) Stable learning via sample reweighting . In: Proceedings of the AAAI Conference on Artificial Intelligence, 34, pp. 5692–5699

  2. Kuang K, Xiong R, Cui P, Athey S, Li B (2020) Stable prediction with model misspecification and agnostic distribution shift . In: Proceedings of the AAAI Conference on Artificial Intelligence, 34, pp. 4485–4492

  3. Miller JP, Taori R, Raghunathan A, Sagawa S, Koh PW, Shankar V, Liang P, Carmon Y, Schmidt L (2021) Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization . In: International Conference on Machine Learning, pp. 7721–7735. PMLR

  4. Zhang X, Cui P, Xu R, Zhou L, He Y, Shen Z (2021) Deep stable learning for out-of-distribution generalization . In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5372 – 5382

  5. Krueger D, Caballero E, Jacobsen J-H, Zhang A, Binas J, Zhang D, Le Priol R, Courville A (2021) Out-of-distribution generalization via risk extrapolation (rex). In: International Conference on Machine Learning, pp. 5815–582. PMLR

  6. Krogh A, Hertz JA (1991) A simple weight decay can improve generalization. Adv Neural Inf Process Syst 4:950–957

  7. Kukačka J, Golkov V, Cremers D (2017) Regularization for deep learning: A taxonomy. arXiv preprint arXiv:1710.10686

  8. Goodfellow I, Bengio Y, Courville A (2016) Regularization for deep learning. Deep Learning 216–261. MIT press Cambridge, MA, USA

  9. Wu C, Gales MJ, Ragni A, Karanasou P, Sim KC (2017) Improving interpretability and regularization in deep learning. IEEE/ACM Trans Audio Speech Lang Process 26(2):256–265

    Article  Google Scholar 

  10. Santos CFGD, Papa JP (2022) Avoiding overfitting: A survey on regularization methods for convolutional neural networks. ACM Comput Surv (CSUR) 54(10s):1–25

    Article  Google Scholar 

  11. Wu M, Parbhoo S, Hughes M, Kindle R, Celi L, Zazzi M, Roth V, Doshi-Velez F (2020) Regional tree regularization for interpretability in deep neural networks . In: Proceedings of the AAAI Conference on Artificial Intelligence, 34, pp. 6413–6421

  12. Gunasekar S, Woodworth BE, Bhojanapalli S, Neyshabur B, Srebro N (2017) Implicit regularization in matrix factorization. Adv Neural Inf Process Syst 30

  13. Razin N, Cohen N (2020) Implicit regularization in deep learning may not be explainable by norms. Adv Neural Inf Process Syst 33:21174–21187

    Google Scholar 

  14. Hubel DH, Wisel TN (1962) Receptive fields, binocular interaction and functional architecture in cat’s visual cortex. J Physiol 160:106–154

    Article  Google Scholar 

  15. Bienenstock EL, Cooper LN, Munro PW (1982) Theory for the development of neuron selectivity. J Neurosci 2:32–48

    Article  Google Scholar 

  16. White LE, Coppola DM, Fitzpatrick D (2001) The contribution of sensory experience to the maturation of orientation selectivity in ferret visual cortex. Nature 411(6841):1049–1052

    Article  Google Scholar 

  17. Schoups A, Vogels R, Qian N, Orban G (2001) Practising orientation identification improves orientation coding in v1 neurons. Nature 412(6846):549–553

    Article  Google Scholar 

  18. Cadieu C, Kouh M, Pasupathy A, Connor CE, Riesenhuber M, Poggio T (2007) A model of v4 shape selectivity and invariance. J Neurophysiol 98(3):1733–1750

    Article  Google Scholar 

  19. Jehee JF, Ling S, Swisher JD, van Bergen RS, Tong F (2012) Perceptual learning selectively refines orientation representations in early visual cortex. J Neurosci 32(47):16747–16753

    Article  Google Scholar 

  20. Barak O, Rigotti M, Fusi S (2013) The sparseness of mixed selectivity neurons controls the generalization–discrimination trade-off. J Neurosci 33(9):3844–3856

    Article  Google Scholar 

  21. Wang Z, Zeng T, Ren Y, Lin Y, Xu H, Zhao X, Liu Y, Ielmini D (2020) Toward a generalized bienenstock-cooper-munro rule for spatiotemporal learning via triplet-stdp in memristive devices. Nat Commun 11(1):1–10

    Google Scholar 

  22. Deco G, Finnof W, Zimmermann HG (1995) Unsupervised mutual information criterion for elimination of overtraining in supervised multiplayer networks. Neural Comput 7:86–107

    Article  Google Scholar 

  23. Deco G, Parra L (1997) Non-feature extraction by redundancy reduction in an unsupervised stochastic neural networks. Neural Netw 10(4):683–691

    Article  Google Scholar 

  24. Kohonen T (1990) The self-organizing maps. Proc IEEE 78(9):1464–1480

    Article  Google Scholar 

  25. Kohonen T (1995) Self-Organizing Maps. Springer, New York

    Book  MATH  Google Scholar 

  26. Ohno S, Kidera S, Kirimoto T (2013) Efficient automatic target recognition method for aircraft SAR image using supervised som clustering . In: Synthetic Aperture Radar (APSAR), 2013 Asia-Pacific Conference On, pp. 601–604. IEEE

  27. Lin G-Y, Cheng P-J (2022) R-teafor: Regularized teacher-forcing for abstractive summarization. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 6303–6311

  28. Rumelhart DE, Zipser D (1985) Feature discovery by competitive learning. Cogn Sci 9:75–112

    Google Scholar 

  29. Gabrié M, Manoel A, Luneau C, Macris N, Krzakala F, Zdeborová L et al (2018) Entropy and mutual information in models of deep neural networks. Adv Neural Inf Process Syst 31

  30. Koch-Janusz M, Ringel Z (2018) Mutual information, neural networks and the renormalization group. Nat Phys 14(6):578–582

    Article  Google Scholar 

  31. Fritschek R, Schaefer RF, Wunder G (2019) Deep learning for channel coding via neural mutual information estimation . In: 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pp. 1–5. IEEE

  32. Zhang Y, Fitch P, Vilas MP, Thorburn PJ (2019) Applying multi-layer artificial neural network and mutual information to the prediction of trends in dissolved oxygen. Front Environ Sci 7:46

    Article  Google Scholar 

  33. Molavipour S, Bassi G, Skoglund M (2020) Conditional mutual information neural estimator. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5025–5029. IEEE

  34. Xia Y, Zhou J, Shi Z, Lu C, Huang H (2020) Generative adversarial regularized mutual information policy gradient framework for automatic diagnosis . In: Proceedings of the AAAI Conference on Artificial Intelligence, 34, pp. 1062–1069

  35. Steinke T, Zakynthinou L (2020) Reasoning about generalization via conditional mutual information . In: Conference on Learning Theory, pp. 3437–3452. PMLR

  36. Meng Q, Matthew J, Zimmer VA, Gomez A, Lloyd DF, Rueckert D, Kainz B (2020) Mutual information-based disentangled neural networks for classifying unseen categories in different domains: application to fetal ultrasound imaging. IEEE Trans Med Imaging 40(2):722–734

    Article  Google Scholar 

  37. Tishby N, Zaslavsky N (2015) Deep learning and the information bottleneck principle . In: 2015 IEEE Information Theory Workshop (ITW), pp. 1–5. IEEE

  38. Chalk M, Marre O, Tkacik G (2016) Relevant sparse codes with variational information bottleneck. Adv Neural Inf Process Syst 29:1957–1965

    Google Scholar 

  39. Kolchinsky A, Tracey BD, Wolpert DH (2019) Nonlinear information bottleneck. Entropy 21(12):1181

    Article  MathSciNet  Google Scholar 

  40. Amjad RA, Geiger BC (2019) Learning representations for neural network-based classification using the information bottleneck principle. IEEE Trans Pattern Anal Mach Intell 42(9):2225–2239

    Article  Google Scholar 

  41. Buciluˇa C, Caruana R, Niculescu-Mizil A (2006) Model compression . In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pp. 535–541. ACM

  42. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531

  43. Luo P, Zhu Z, Liu Z, Wang X, Tang X (2016) Face model compression by distilling knowledge from neurons . In: Thirtieth AAAI Conference on Artificial Intelligence

  44. Yang C, Rangarajan A, Ranka S (2018) Global model interpretation via recursive partitioning . In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1563–1570. IEEE

  45. Lapuschkin S, Wäldchen S, Binder A, Montavon G, Samek W, Müller K-R (2019) Unmasking clever hans predictors and assessing what machines really learn. Nat Commun 10(1):1–8

    Article  Google Scholar 

  46. Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee S-I (2020) From local explanations to global understanding with explainable ai for trees. Nature Mach Intell 2(1):2522–5839

    Google Scholar 

  47. Nguyen A, Yosinski J, Clune J (2019) Understanding neural networks via feature visualization: A survey. In: Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, pp. 55–76. Springer

  48. Erhan D, Bengio Y, Courville A, Vincent P (2009) Visualizing higher-layer features of a deep network. University of Montreal 1341(3):1

  49. Mahendran A, Vedaldi A (2015) Understanding deep image representations by inverting them . In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5188–5196

  50. Montavon G, Binder A, Lapuschkin S, Samek W, Müller K-R (2019) Layer-wise relevance propagation: an overview . In: Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, pp. 193–209. Springer, NY

  51. Bach S, Binder A, Montavon G, Klauschen F, Müller K-R, Samek W (2015) On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7):0130140

    Article  Google Scholar 

  52. Lapuschkin S, Binder A, Montavon G, Muller K-R, Samek W (2016) Analyzing classifiers: Fisher vectors and deep neural networks . In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2912–2920

  53. Arbabzadah F, Montavon G, Müller K-R, Samek W (2016) Identifying individual facial expressions by deconstructing a neural network. In: German Conference on Pattern Recognition, pp. 344–354. Springer

  54. Sturm I, Lapuschkin S, Samek W, Müller K-R (2016) Interpretable deep neural networks for single-trial eeg classification. J Neurosci Methods 274:141–145

    Article  Google Scholar 

  55. Binder A, Montavon G, Lapuschkin S, Müller K-R, Samek W (2016) Layer-wise relevance propagation for neural networks with local renormalization layers. In: International Conference on Artificial Neural Networks, pp. 63–71. Springer

  56. Dietterich TG (2000) Ensemble methods in machine learning . In: International Workshop on Multiple Classifier Systems, pp. 1–15. Springer

  57. Liu L, Wei W, Chow K-H, Loper M, Gursoy E, Truex S, Wu Y (2019) Deep neural network ensembles against deception: Ensemble diversity, accuracy and robustness . In: 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), pp. 274–282. IEEE

  58. Wei W, Liu L (2020) Robust deep learning ensemble against deception. IEEE Transactions on Dependable and Secure Computing 18(4):1513–1527. IEEE

  59. Wall R, Cunningham P (2000) Exploring the potential for rule extraction from ensembles of neural networks . In: 11th Irish Conference on Artificial Intelligence & Cognitive Science, pp. 52–68

  60. Cover TM, Thomas JA (1991) Elements of Information Theory. John Wiley and Sons, INC., Hoboken, NJ

  61. Shimizu K (2009) Multivariate analysis (in Japanese). Nikkan Kogyo Shinbun, Tokyo

  62. Kenji U (2021) Text Mining (in Japanese). Asakura-shoten, Tokyo

  63. Vessonen E (2021) Conceptual engineering and operationalism in psychology. Synthese 199(3):10615–10637

    Article  MathSciNet  Google Scholar 

  64. Sengupta E, Garg D, Choudhury T, Aggarwal A (2018) Techniques to elimenate human bias in machine learning . In: 2018 International Conference on System Modeling & Advancement in Research Trends (SMART), pp. 226–230. IEEE

  65. Vessonen E (2021) Respectful operationalism. Theory Psychol 31(1):84–105

    Article  Google Scholar 

  66. Zhou B, Bau D, Oliva A, Torralba A (2018) Interpreting deep visual representations via network dissection. IEEE Trans Pattern Anal Mach Intell 41(9):2131–2145

    Article  Google Scholar 

  67. Zhou B, Bau D, Oliva A, Torralba A (2019) Comparing the interpretability of deep networks via network dissection. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Springer, New York, pp 243–252

    Chapter  Google Scholar 

  68. Langley P (2022) The computational gauntlet of human-like learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 36, pp. 12268–12273

  69. Tuli S, Dasgupta I, Grant E, Griffiths TL (2021) Are convolutional neural networks or transformers more like human vision? arXiv preprint arXiv:2105.07197

  70. Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 30

  71. Rumelhart DE, McClelland JL (1986) On learning the past tenses of English verbs. In: Rumelhart DE, Hinton GE, Williams RJ (eds) Parallel Distributed Processing vol. 2. MIT Press, Cambrige, pp 216–271

    Chapter  Google Scholar 

  72. Lake BM, Ullman TD, Tenenbaum JB, Gershman SJ (2017) Building machines that learn and think like people. Behav Brain Sci 40:e253. Cambridge University Press

  73. Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Adv Neural Inf Process Syst 29

  74. Wang T, Zhao J, Yatskar M, Chang K-W, Ordonez V (2019) Balanced datasets are not enough: Estimating and mitigating gender bias in deep image representations . In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5310–5319

  75. Wang L, Yan Y, He K, Wu Y, Xu W (2021) Dynamically disentangling social bias from task-oriented representations with adversarial attack . In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3740–3750

  76. Hall ET (1976) Beyond culture. Anchor, Garden city, NY

    Google Scholar 

  77. Ilyas A, Santurkar S, Tsipras D, Engstrom L, Tran B, Madry A (2019) Adversarial examples are not bugs, they are features. Adv Neural Inf Process Syst 32

  78. Capone L, Bertolaso M et al (2020) A philosophical approach for a human-centered explainable AI. In: XAI. It@ AI* IA ,pp. 80–86

  79. Ozmen Garibay O, Winslow B, Andolina S, Antona M, Bodenschatz A, Coursaris C, Falco G, Fiore SM, Garibay I, Grieman K et al (2023) Six human-centered artificial intelligence grand challenges. Int J Human–Comput Interact 39(3):391–437

Download references

Acknowledgements

The author would like to thank Mitali Das for correcting and revising the paper.

Author information

Authors and Affiliations

Authors

Contributions

The author was responsible for all the conception or design of the work; or the acquisition, analysis, or interpretation of data; or the creation of new software used in the work.

Corresponding author

Correspondence to Ryotaro Kamimura.

Ethics declarations

Submission declaration and verification

This paper has been written based on the paper “Serially Disentangled Learning for Multi-Layered Neural Networks,” presented in the IEA/AIE 2022.

Data and material transparency

The author agrees to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Conflict of potential and competing interests

The author has no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1

Appendix 1

1.1 Note on the parameter setting

For the practical computation, we modified the original complementary potentiality \(\overline{r }\) with some parameters to eliminate several extreme values that we encountered in the middle of learning. The modified complementary potentiality is given by

$${\overline{r} }_{jk}^{(t,t+1)}={\left[1-\frac{{u}_{jk}^{(t,t+1)}}{\mathrm{m}a{x}_{j\mathrm{^{\prime}}k\mathrm{^{\prime}}} {u}_{j\mathrm{^{\prime}}k\mathrm{^{\prime}}}^{(t,t+1)}}+\varepsilon \right]}^{\gamma }.$$
(25)

These parameters were used only to stabilize learning by weakening the effects of individual potentiality. The parameter values \(\varepsilon =0.01\) and \(\gamma =0.05\) were set. With this modified complementary potentiality, weights were computed for the \((n+1)\) th learning step as follows:

$${w}_{jk}^{(t,t+1)}(n+1)={\theta }^{wgt}{\overline{r} }_{jk}^{(t,t+1)}(n) {w}_{jk}^{(t,t+1)}(n).$$
(26)

This computation should be followed by 50 learning epochs with conventional error minimization, independent of the above computational procedure. The parameter \({\theta }^{wgt}\) was set to a very large value of 1.8 for the forced method and 1.0 for all other cases. Additionally, the input, output, and layer neutrality were computed using the corresponding parameter \(\theta =1.0\). However, only for the forced method, \({\theta }^{in}\) and \({\theta }^{out}\) were set to 0.9 to stabilize learning. Moreover, only for the forced method, \({\theta }^{lay}\) for the layer potentiality was set to 1.2 to effectively increase the strength of weights.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kamimura, R. Contradiction neutralization for interpreting multi-layered neural networks. Appl Intell 53, 28349–28376 (2023). https://doi.org/10.1007/s10489-023-04883-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04883-z

Keywords

Navigation