Abstract
The present paper aims to propose a new method for neutralizing contradictions in neural networks. Neural networks exhibit numerous contradictions in the form of contrasts, differences, and errors, making it extremely challenging to find a compromise between them. In this context, neutralization is introduced not to resolve these contradictions, but to weaken them by transforming them into more manageable and concrete forms. In this paper, contradictions are neutralized or weakened through four neutralization methods: comprehensive, nullified, compressive, and collective. Comprehensive neutralization involves increasing the neutrality of all components in a neural network. Nullified neutralization is employed to weaken contradictions among different computational and optimization procedures. Compressive neutralization aims to simplify multi-layered neural networks while preserving the original internal information as much as possible. Collective neutralization is achieved by considering as many final networks as possible under different conditions, inputs, learning steps, and so on. The proposed method was applied to two data sets, one of which consisted of irregular forms resulting from natural language processing. The experimental results demonstrate that comprehensive neutralization could enhance the neutrality of all components and represent features across a broader range of components, thereby improving generalization. Nullified neutralization enabled a compromise between neutrality maximization and error minimization. Through compressive and collective neutralization of a large number of compressed weights, it became possible to interpret compressed and collective weights. In particular, inputs that were considered relatively unimportant by conventional methods emerged as highly significant. Finally, these results were compared with those obtained in the field of the human-centered approach to provide a clearer understanding of the significance of contradiction resolution, applied to neural networks.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04883-z/MediaObjects/10489_2023_4883_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04883-z/MediaObjects/10489_2023_4883_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04883-z/MediaObjects/10489_2023_4883_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04883-z/MediaObjects/10489_2023_4883_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04883-z/MediaObjects/10489_2023_4883_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04883-z/MediaObjects/10489_2023_4883_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04883-z/MediaObjects/10489_2023_4883_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04883-z/MediaObjects/10489_2023_4883_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04883-z/MediaObjects/10489_2023_4883_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04883-z/MediaObjects/10489_2023_4883_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04883-z/MediaObjects/10489_2023_4883_Fig11_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04883-z/MediaObjects/10489_2023_4883_Fig12_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04883-z/MediaObjects/10489_2023_4883_Fig13_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04883-z/MediaObjects/10489_2023_4883_Fig14_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs10489-023-04883-z/MediaObjects/10489_2023_4883_Fig15_HTML.png)
Similar content being viewed by others
Data availability
The first data set was taken from a homepage.
http://pub.nikkan.co.jp/tahennryou/tahennryou.html
Note that this page has been unavailable for some reason.
The second data set could be obtained from a homepage.
https://www.asakura.co.jp/detail.php?
book$_$code = 12261).
For further information, please contact the corresponding author.
References
Shen Z, Cui P, Zhang T, Kunag K (2020) Stable learning via sample reweighting . In: Proceedings of the AAAI Conference on Artificial Intelligence, 34, pp. 5692–5699
Kuang K, Xiong R, Cui P, Athey S, Li B (2020) Stable prediction with model misspecification and agnostic distribution shift . In: Proceedings of the AAAI Conference on Artificial Intelligence, 34, pp. 4485–4492
Miller JP, Taori R, Raghunathan A, Sagawa S, Koh PW, Shankar V, Liang P, Carmon Y, Schmidt L (2021) Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization . In: International Conference on Machine Learning, pp. 7721–7735. PMLR
Zhang X, Cui P, Xu R, Zhou L, He Y, Shen Z (2021) Deep stable learning for out-of-distribution generalization . In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5372 – 5382
Krueger D, Caballero E, Jacobsen J-H, Zhang A, Binas J, Zhang D, Le Priol R, Courville A (2021) Out-of-distribution generalization via risk extrapolation (rex). In: International Conference on Machine Learning, pp. 5815–582. PMLR
Krogh A, Hertz JA (1991) A simple weight decay can improve generalization. Adv Neural Inf Process Syst 4:950–957
Kukačka J, Golkov V, Cremers D (2017) Regularization for deep learning: A taxonomy. arXiv preprint arXiv:1710.10686
Goodfellow I, Bengio Y, Courville A (2016) Regularization for deep learning. Deep Learning 216–261. MIT press Cambridge, MA, USA
Wu C, Gales MJ, Ragni A, Karanasou P, Sim KC (2017) Improving interpretability and regularization in deep learning. IEEE/ACM Trans Audio Speech Lang Process 26(2):256–265
Santos CFGD, Papa JP (2022) Avoiding overfitting: A survey on regularization methods for convolutional neural networks. ACM Comput Surv (CSUR) 54(10s):1–25
Wu M, Parbhoo S, Hughes M, Kindle R, Celi L, Zazzi M, Roth V, Doshi-Velez F (2020) Regional tree regularization for interpretability in deep neural networks . In: Proceedings of the AAAI Conference on Artificial Intelligence, 34, pp. 6413–6421
Gunasekar S, Woodworth BE, Bhojanapalli S, Neyshabur B, Srebro N (2017) Implicit regularization in matrix factorization. Adv Neural Inf Process Syst 30
Razin N, Cohen N (2020) Implicit regularization in deep learning may not be explainable by norms. Adv Neural Inf Process Syst 33:21174–21187
Hubel DH, Wisel TN (1962) Receptive fields, binocular interaction and functional architecture in cat’s visual cortex. J Physiol 160:106–154
Bienenstock EL, Cooper LN, Munro PW (1982) Theory for the development of neuron selectivity. J Neurosci 2:32–48
White LE, Coppola DM, Fitzpatrick D (2001) The contribution of sensory experience to the maturation of orientation selectivity in ferret visual cortex. Nature 411(6841):1049–1052
Schoups A, Vogels R, Qian N, Orban G (2001) Practising orientation identification improves orientation coding in v1 neurons. Nature 412(6846):549–553
Cadieu C, Kouh M, Pasupathy A, Connor CE, Riesenhuber M, Poggio T (2007) A model of v4 shape selectivity and invariance. J Neurophysiol 98(3):1733–1750
Jehee JF, Ling S, Swisher JD, van Bergen RS, Tong F (2012) Perceptual learning selectively refines orientation representations in early visual cortex. J Neurosci 32(47):16747–16753
Barak O, Rigotti M, Fusi S (2013) The sparseness of mixed selectivity neurons controls the generalization–discrimination trade-off. J Neurosci 33(9):3844–3856
Wang Z, Zeng T, Ren Y, Lin Y, Xu H, Zhao X, Liu Y, Ielmini D (2020) Toward a generalized bienenstock-cooper-munro rule for spatiotemporal learning via triplet-stdp in memristive devices. Nat Commun 11(1):1–10
Deco G, Finnof W, Zimmermann HG (1995) Unsupervised mutual information criterion for elimination of overtraining in supervised multiplayer networks. Neural Comput 7:86–107
Deco G, Parra L (1997) Non-feature extraction by redundancy reduction in an unsupervised stochastic neural networks. Neural Netw 10(4):683–691
Kohonen T (1990) The self-organizing maps. Proc IEEE 78(9):1464–1480
Kohonen T (1995) Self-Organizing Maps. Springer, New York
Ohno S, Kidera S, Kirimoto T (2013) Efficient automatic target recognition method for aircraft SAR image using supervised som clustering . In: Synthetic Aperture Radar (APSAR), 2013 Asia-Pacific Conference On, pp. 601–604. IEEE
Lin G-Y, Cheng P-J (2022) R-teafor: Regularized teacher-forcing for abstractive summarization. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 6303–6311
Rumelhart DE, Zipser D (1985) Feature discovery by competitive learning. Cogn Sci 9:75–112
Gabrié M, Manoel A, Luneau C, Macris N, Krzakala F, Zdeborová L et al (2018) Entropy and mutual information in models of deep neural networks. Adv Neural Inf Process Syst 31
Koch-Janusz M, Ringel Z (2018) Mutual information, neural networks and the renormalization group. Nat Phys 14(6):578–582
Fritschek R, Schaefer RF, Wunder G (2019) Deep learning for channel coding via neural mutual information estimation . In: 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pp. 1–5. IEEE
Zhang Y, Fitch P, Vilas MP, Thorburn PJ (2019) Applying multi-layer artificial neural network and mutual information to the prediction of trends in dissolved oxygen. Front Environ Sci 7:46
Molavipour S, Bassi G, Skoglund M (2020) Conditional mutual information neural estimator. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5025–5029. IEEE
Xia Y, Zhou J, Shi Z, Lu C, Huang H (2020) Generative adversarial regularized mutual information policy gradient framework for automatic diagnosis . In: Proceedings of the AAAI Conference on Artificial Intelligence, 34, pp. 1062–1069
Steinke T, Zakynthinou L (2020) Reasoning about generalization via conditional mutual information . In: Conference on Learning Theory, pp. 3437–3452. PMLR
Meng Q, Matthew J, Zimmer VA, Gomez A, Lloyd DF, Rueckert D, Kainz B (2020) Mutual information-based disentangled neural networks for classifying unseen categories in different domains: application to fetal ultrasound imaging. IEEE Trans Med Imaging 40(2):722–734
Tishby N, Zaslavsky N (2015) Deep learning and the information bottleneck principle . In: 2015 IEEE Information Theory Workshop (ITW), pp. 1–5. IEEE
Chalk M, Marre O, Tkacik G (2016) Relevant sparse codes with variational information bottleneck. Adv Neural Inf Process Syst 29:1957–1965
Kolchinsky A, Tracey BD, Wolpert DH (2019) Nonlinear information bottleneck. Entropy 21(12):1181
Amjad RA, Geiger BC (2019) Learning representations for neural network-based classification using the information bottleneck principle. IEEE Trans Pattern Anal Mach Intell 42(9):2225–2239
Buciluˇa C, Caruana R, Niculescu-Mizil A (2006) Model compression . In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pp. 535–541. ACM
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531
Luo P, Zhu Z, Liu Z, Wang X, Tang X (2016) Face model compression by distilling knowledge from neurons . In: Thirtieth AAAI Conference on Artificial Intelligence
Yang C, Rangarajan A, Ranka S (2018) Global model interpretation via recursive partitioning . In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1563–1570. IEEE
Lapuschkin S, Wäldchen S, Binder A, Montavon G, Samek W, Müller K-R (2019) Unmasking clever hans predictors and assessing what machines really learn. Nat Commun 10(1):1–8
Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee S-I (2020) From local explanations to global understanding with explainable ai for trees. Nature Mach Intell 2(1):2522–5839
Nguyen A, Yosinski J, Clune J (2019) Understanding neural networks via feature visualization: A survey. In: Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, pp. 55–76. Springer
Erhan D, Bengio Y, Courville A, Vincent P (2009) Visualizing higher-layer features of a deep network. University of Montreal 1341(3):1
Mahendran A, Vedaldi A (2015) Understanding deep image representations by inverting them . In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5188–5196
Montavon G, Binder A, Lapuschkin S, Samek W, Müller K-R (2019) Layer-wise relevance propagation: an overview . In: Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, pp. 193–209. Springer, NY
Bach S, Binder A, Montavon G, Klauschen F, Müller K-R, Samek W (2015) On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7):0130140
Lapuschkin S, Binder A, Montavon G, Muller K-R, Samek W (2016) Analyzing classifiers: Fisher vectors and deep neural networks . In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2912–2920
Arbabzadah F, Montavon G, Müller K-R, Samek W (2016) Identifying individual facial expressions by deconstructing a neural network. In: German Conference on Pattern Recognition, pp. 344–354. Springer
Sturm I, Lapuschkin S, Samek W, Müller K-R (2016) Interpretable deep neural networks for single-trial eeg classification. J Neurosci Methods 274:141–145
Binder A, Montavon G, Lapuschkin S, Müller K-R, Samek W (2016) Layer-wise relevance propagation for neural networks with local renormalization layers. In: International Conference on Artificial Neural Networks, pp. 63–71. Springer
Dietterich TG (2000) Ensemble methods in machine learning . In: International Workshop on Multiple Classifier Systems, pp. 1–15. Springer
Liu L, Wei W, Chow K-H, Loper M, Gursoy E, Truex S, Wu Y (2019) Deep neural network ensembles against deception: Ensemble diversity, accuracy and robustness . In: 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), pp. 274–282. IEEE
Wei W, Liu L (2020) Robust deep learning ensemble against deception. IEEE Transactions on Dependable and Secure Computing 18(4):1513–1527. IEEE
Wall R, Cunningham P (2000) Exploring the potential for rule extraction from ensembles of neural networks . In: 11th Irish Conference on Artificial Intelligence & Cognitive Science, pp. 52–68
Cover TM, Thomas JA (1991) Elements of Information Theory. John Wiley and Sons, INC., Hoboken, NJ
Shimizu K (2009) Multivariate analysis (in Japanese). Nikkan Kogyo Shinbun, Tokyo
Kenji U (2021) Text Mining (in Japanese). Asakura-shoten, Tokyo
Vessonen E (2021) Conceptual engineering and operationalism in psychology. Synthese 199(3):10615–10637
Sengupta E, Garg D, Choudhury T, Aggarwal A (2018) Techniques to elimenate human bias in machine learning . In: 2018 International Conference on System Modeling & Advancement in Research Trends (SMART), pp. 226–230. IEEE
Vessonen E (2021) Respectful operationalism. Theory Psychol 31(1):84–105
Zhou B, Bau D, Oliva A, Torralba A (2018) Interpreting deep visual representations via network dissection. IEEE Trans Pattern Anal Mach Intell 41(9):2131–2145
Zhou B, Bau D, Oliva A, Torralba A (2019) Comparing the interpretability of deep networks via network dissection. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Springer, New York, pp 243–252
Langley P (2022) The computational gauntlet of human-like learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 36, pp. 12268–12273
Tuli S, Dasgupta I, Grant E, Griffiths TL (2021) Are convolutional neural networks or transformers more like human vision? arXiv preprint arXiv:2105.07197
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 30
Rumelhart DE, McClelland JL (1986) On learning the past tenses of English verbs. In: Rumelhart DE, Hinton GE, Williams RJ (eds) Parallel Distributed Processing vol. 2. MIT Press, Cambrige, pp 216–271
Lake BM, Ullman TD, Tenenbaum JB, Gershman SJ (2017) Building machines that learn and think like people. Behav Brain Sci 40:e253. Cambridge University Press
Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Adv Neural Inf Process Syst 29
Wang T, Zhao J, Yatskar M, Chang K-W, Ordonez V (2019) Balanced datasets are not enough: Estimating and mitigating gender bias in deep image representations . In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5310–5319
Wang L, Yan Y, He K, Wu Y, Xu W (2021) Dynamically disentangling social bias from task-oriented representations with adversarial attack . In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3740–3750
Hall ET (1976) Beyond culture. Anchor, Garden city, NY
Ilyas A, Santurkar S, Tsipras D, Engstrom L, Tran B, Madry A (2019) Adversarial examples are not bugs, they are features. Adv Neural Inf Process Syst 32
Capone L, Bertolaso M et al (2020) A philosophical approach for a human-centered explainable AI. In: XAI. It@ AI* IA ,pp. 80–86
Ozmen Garibay O, Winslow B, Andolina S, Antona M, Bodenschatz A, Coursaris C, Falco G, Fiore SM, Garibay I, Grieman K et al (2023) Six human-centered artificial intelligence grand challenges. Int J Human–Comput Interact 39(3):391–437
Acknowledgements
The author would like to thank Mitali Das for correcting and revising the paper.
Author information
Authors and Affiliations
Contributions
The author was responsible for all the conception or design of the work; or the acquisition, analysis, or interpretation of data; or the creation of new software used in the work.
Corresponding author
Ethics declarations
Submission declaration and verification
This paper has been written based on the paper “Serially Disentangled Learning for Multi-Layered Neural Networks,” presented in the IEA/AIE 2022.
Data and material transparency
The author agrees to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Conflict of potential and competing interests
The author has no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix 1
Appendix 1
1.1 Note on the parameter setting
For the practical computation, we modified the original complementary potentiality \(\overline{r }\) with some parameters to eliminate several extreme values that we encountered in the middle of learning. The modified complementary potentiality is given by
These parameters were used only to stabilize learning by weakening the effects of individual potentiality. The parameter values \(\varepsilon =0.01\) and \(\gamma =0.05\) were set. With this modified complementary potentiality, weights were computed for the \((n+1)\) th learning step as follows:
This computation should be followed by 50 learning epochs with conventional error minimization, independent of the above computational procedure. The parameter \({\theta }^{wgt}\) was set to a very large value of 1.8 for the forced method and 1.0 for all other cases. Additionally, the input, output, and layer neutrality were computed using the corresponding parameter \(\theta =1.0\). However, only for the forced method, \({\theta }^{in}\) and \({\theta }^{out}\) were set to 0.9 to stabilize learning. Moreover, only for the forced method, \({\theta }^{lay}\) for the layer potentiality was set to 1.2 to effectively increase the strength of weights.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kamimura, R. Contradiction neutralization for interpreting multi-layered neural networks. Appl Intell 53, 28349–28376 (2023). https://doi.org/10.1007/s10489-023-04883-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04883-z