Contradiction neutralization for interpreting multi-layered neural networks

Kamimura, Ryotaro

doi:10.1007/s10489-023-04883-z

Contradiction neutralization for interpreting multi-layered neural networks

Published: 02 October 2023

Volume 53, pages 28349–28376, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Ryotaro Kamimura ORCID: orcid.org/0000-0002-4238-3463^1,2

84 Accesses
Explore all metrics

Abstract

The present paper aims to propose a new method for neutralizing contradictions in neural networks. Neural networks exhibit numerous contradictions in the form of contrasts, differences, and errors, making it extremely challenging to find a compromise between them. In this context, neutralization is introduced not to resolve these contradictions, but to weaken them by transforming them into more manageable and concrete forms. In this paper, contradictions are neutralized or weakened through four neutralization methods: comprehensive, nullified, compressive, and collective. Comprehensive neutralization involves increasing the neutrality of all components in a neural network. Nullified neutralization is employed to weaken contradictions among different computational and optimization procedures. Compressive neutralization aims to simplify multi-layered neural networks while preserving the original internal information as much as possible. Collective neutralization is achieved by considering as many final networks as possible under different conditions, inputs, learning steps, and so on. The proposed method was applied to two data sets, one of which consisted of irregular forms resulting from natural language processing. The experimental results demonstrate that comprehensive neutralization could enhance the neutrality of all components and represent features across a broader range of components, thereby improving generalization. Nullified neutralization enabled a compromise between neutrality maximization and error minimization. Through compressive and collective neutralization of a large number of compressed weights, it became possible to interpret compressed and collective weights. In particular, inputs that were considered relatively unimportant by conventional methods emerged as highly significant. Finally, these results were compared with those obtained in the field of the human-centered approach to provide a clearer understanding of the significance of contradiction resolution, applied to neural networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Paraconsistent Artificial Neural Cell of Learning by Contradiction Extraction (PANCLCTX) with Application Examples

Investigating the Compositional Structure of Deep Neural Networks

Application of the Topological Gradient to Parsimonious Neural Networks

Data availability

The first data set was taken from a homepage.

http://pub.nikkan.co.jp/tahennryou/tahennryou.html

Note that this page has been unavailable for some reason.

The second data set could be obtained from a homepage.

https://www.asakura.co.jp/detail.php?

book$_$code = 12261).

For further information, please contact the corresponding author.

References

Shen Z, Cui P, Zhang T, Kunag K (2020) Stable learning via sample reweighting . In: Proceedings of the AAAI Conference on Artificial Intelligence, 34, pp. 5692–5699
Kuang K, Xiong R, Cui P, Athey S, Li B (2020) Stable prediction with model misspecification and agnostic distribution shift . In: Proceedings of the AAAI Conference on Artificial Intelligence, 34, pp. 4485–4492
Miller JP, Taori R, Raghunathan A, Sagawa S, Koh PW, Shankar V, Liang P, Carmon Y, Schmidt L (2021) Accuracy on the line: on the strong correlation between out-of-distribution and in-distribution generalization . In: International Conference on Machine Learning, pp. 7721–7735. PMLR
Zhang X, Cui P, Xu R, Zhou L, He Y, Shen Z (2021) Deep stable learning for out-of-distribution generalization . In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5372 – 5382
Krueger D, Caballero E, Jacobsen J-H, Zhang A, Binas J, Zhang D, Le Priol R, Courville A (2021) Out-of-distribution generalization via risk extrapolation (rex). In: International Conference on Machine Learning, pp. 5815–582. PMLR
Krogh A, Hertz JA (1991) A simple weight decay can improve generalization. Adv Neural Inf Process Syst 4:950–957
Kukačka J, Golkov V, Cremers D (2017) Regularization for deep learning: A taxonomy. arXiv preprint arXiv:1710.10686
Goodfellow I, Bengio Y, Courville A (2016) Regularization for deep learning. Deep Learning 216–261. MIT press Cambridge, MA, USA
Wu C, Gales MJ, Ragni A, Karanasou P, Sim KC (2017) Improving interpretability and regularization in deep learning. IEEE/ACM Trans Audio Speech Lang Process 26(2):256–265
Article Google Scholar
Santos CFGD, Papa JP (2022) Avoiding overfitting: A survey on regularization methods for convolutional neural networks. ACM Comput Surv (CSUR) 54(10s):1–25
Article Google Scholar
Wu M, Parbhoo S, Hughes M, Kindle R, Celi L, Zazzi M, Roth V, Doshi-Velez F (2020) Regional tree regularization for interpretability in deep neural networks . In: Proceedings of the AAAI Conference on Artificial Intelligence, 34, pp. 6413–6421
Gunasekar S, Woodworth BE, Bhojanapalli S, Neyshabur B, Srebro N (2017) Implicit regularization in matrix factorization. Adv Neural Inf Process Syst 30
Razin N, Cohen N (2020) Implicit regularization in deep learning may not be explainable by norms. Adv Neural Inf Process Syst 33:21174–21187
Google Scholar
Hubel DH, Wisel TN (1962) Receptive fields, binocular interaction and functional architecture in cat’s visual cortex. J Physiol 160:106–154
Article Google Scholar
Bienenstock EL, Cooper LN, Munro PW (1982) Theory for the development of neuron selectivity. J Neurosci 2:32–48
Article Google Scholar
White LE, Coppola DM, Fitzpatrick D (2001) The contribution of sensory experience to the maturation of orientation selectivity in ferret visual cortex. Nature 411(6841):1049–1052
Article Google Scholar
Schoups A, Vogels R, Qian N, Orban G (2001) Practising orientation identification improves orientation coding in v1 neurons. Nature 412(6846):549–553
Article Google Scholar
Cadieu C, Kouh M, Pasupathy A, Connor CE, Riesenhuber M, Poggio T (2007) A model of v4 shape selectivity and invariance. J Neurophysiol 98(3):1733–1750
Article Google Scholar
Jehee JF, Ling S, Swisher JD, van Bergen RS, Tong F (2012) Perceptual learning selectively refines orientation representations in early visual cortex. J Neurosci 32(47):16747–16753
Article Google Scholar
Barak O, Rigotti M, Fusi S (2013) The sparseness of mixed selectivity neurons controls the generalization–discrimination trade-off. J Neurosci 33(9):3844–3856
Article Google Scholar
Wang Z, Zeng T, Ren Y, Lin Y, Xu H, Zhao X, Liu Y, Ielmini D (2020) Toward a generalized bienenstock-cooper-munro rule for spatiotemporal learning via triplet-stdp in memristive devices. Nat Commun 11(1):1–10
Google Scholar
Deco G, Finnof W, Zimmermann HG (1995) Unsupervised mutual information criterion for elimination of overtraining in supervised multiplayer networks. Neural Comput 7:86–107
Article Google Scholar
Deco G, Parra L (1997) Non-feature extraction by redundancy reduction in an unsupervised stochastic neural networks. Neural Netw 10(4):683–691
Article Google Scholar
Kohonen T (1990) The self-organizing maps. Proc IEEE 78(9):1464–1480
Article Google Scholar
Kohonen T (1995) Self-Organizing Maps. Springer, New York
Book MATH Google Scholar
Ohno S, Kidera S, Kirimoto T (2013) Efficient automatic target recognition method for aircraft SAR image using supervised som clustering . In: Synthetic Aperture Radar (APSAR), 2013 Asia-Pacific Conference On, pp. 601–604. IEEE
Lin G-Y, Cheng P-J (2022) R-teafor: Regularized teacher-forcing for abstractive summarization. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 6303–6311
Rumelhart DE, Zipser D (1985) Feature discovery by competitive learning. Cogn Sci 9:75–112
Google Scholar
Gabrié M, Manoel A, Luneau C, Macris N, Krzakala F, Zdeborová L et al (2018) Entropy and mutual information in models of deep neural networks. Adv Neural Inf Process Syst 31
Koch-Janusz M, Ringel Z (2018) Mutual information, neural networks and the renormalization group. Nat Phys 14(6):578–582
Article Google Scholar
Fritschek R, Schaefer RF, Wunder G (2019) Deep learning for channel coding via neural mutual information estimation . In: 2019 IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pp. 1–5. IEEE
Zhang Y, Fitch P, Vilas MP, Thorburn PJ (2019) Applying multi-layer artificial neural network and mutual information to the prediction of trends in dissolved oxygen. Front Environ Sci 7:46
Article Google Scholar
Molavipour S, Bassi G, Skoglund M (2020) Conditional mutual information neural estimator. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5025–5029. IEEE
Xia Y, Zhou J, Shi Z, Lu C, Huang H (2020) Generative adversarial regularized mutual information policy gradient framework for automatic diagnosis . In: Proceedings of the AAAI Conference on Artificial Intelligence, 34, pp. 1062–1069
Steinke T, Zakynthinou L (2020) Reasoning about generalization via conditional mutual information . In: Conference on Learning Theory, pp. 3437–3452. PMLR
Meng Q, Matthew J, Zimmer VA, Gomez A, Lloyd DF, Rueckert D, Kainz B (2020) Mutual information-based disentangled neural networks for classifying unseen categories in different domains: application to fetal ultrasound imaging. IEEE Trans Med Imaging 40(2):722–734
Article Google Scholar
Tishby N, Zaslavsky N (2015) Deep learning and the information bottleneck principle . In: 2015 IEEE Information Theory Workshop (ITW), pp. 1–5. IEEE
Chalk M, Marre O, Tkacik G (2016) Relevant sparse codes with variational information bottleneck. Adv Neural Inf Process Syst 29:1957–1965
Google Scholar
Kolchinsky A, Tracey BD, Wolpert DH (2019) Nonlinear information bottleneck. Entropy 21(12):1181
Article MathSciNet Google Scholar
Amjad RA, Geiger BC (2019) Learning representations for neural network-based classification using the information bottleneck principle. IEEE Trans Pattern Anal Mach Intell 42(9):2225–2239
Article Google Scholar
Buciluˇa C, Caruana R, Niculescu-Mizil A (2006) Model compression . In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pp. 535–541. ACM
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531
Luo P, Zhu Z, Liu Z, Wang X, Tang X (2016) Face model compression by distilling knowledge from neurons . In: Thirtieth AAAI Conference on Artificial Intelligence
Yang C, Rangarajan A, Ranka S (2018) Global model interpretation via recursive partitioning . In: 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), pp. 1563–1570. IEEE
Lapuschkin S, Wäldchen S, Binder A, Montavon G, Samek W, Müller K-R (2019) Unmasking clever hans predictors and assessing what machines really learn. Nat Commun 10(1):1–8
Article Google Scholar
Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, Katz R, Himmelfarb J, Bansal N, Lee S-I (2020) From local explanations to global understanding with explainable ai for trees. Nature Mach Intell 2(1):2522–5839
Google Scholar
Nguyen A, Yosinski J, Clune J (2019) Understanding neural networks via feature visualization: A survey. In: Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, pp. 55–76. Springer
Erhan D, Bengio Y, Courville A, Vincent P (2009) Visualizing higher-layer features of a deep network. University of Montreal 1341(3):1
Mahendran A, Vedaldi A (2015) Understanding deep image representations by inverting them . In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5188–5196
Montavon G, Binder A, Lapuschkin S, Samek W, Müller K-R (2019) Layer-wise relevance propagation: an overview . In: Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, pp. 193–209. Springer, NY
Bach S, Binder A, Montavon G, Klauschen F, Müller K-R, Samek W (2015) On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7):0130140
Article Google Scholar
Lapuschkin S, Binder A, Montavon G, Muller K-R, Samek W (2016) Analyzing classifiers: Fisher vectors and deep neural networks . In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2912–2920
Arbabzadah F, Montavon G, Müller K-R, Samek W (2016) Identifying individual facial expressions by deconstructing a neural network. In: German Conference on Pattern Recognition, pp. 344–354. Springer
Sturm I, Lapuschkin S, Samek W, Müller K-R (2016) Interpretable deep neural networks for single-trial eeg classification. J Neurosci Methods 274:141–145
Article Google Scholar
Binder A, Montavon G, Lapuschkin S, Müller K-R, Samek W (2016) Layer-wise relevance propagation for neural networks with local renormalization layers. In: International Conference on Artificial Neural Networks, pp. 63–71. Springer
Dietterich TG (2000) Ensemble methods in machine learning . In: International Workshop on Multiple Classifier Systems, pp. 1–15. Springer
Liu L, Wei W, Chow K-H, Loper M, Gursoy E, Truex S, Wu Y (2019) Deep neural network ensembles against deception: Ensemble diversity, accuracy and robustness . In: 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), pp. 274–282. IEEE
Wei W, Liu L (2020) Robust deep learning ensemble against deception. IEEE Transactions on Dependable and Secure Computing 18(4):1513–1527. IEEE
Wall R, Cunningham P (2000) Exploring the potential for rule extraction from ensembles of neural networks . In: 11th Irish Conference on Artificial Intelligence & Cognitive Science, pp. 52–68
Cover TM, Thomas JA (1991) Elements of Information Theory. John Wiley and Sons, INC., Hoboken, NJ
Shimizu K (2009) Multivariate analysis (in Japanese). Nikkan Kogyo Shinbun, Tokyo
Kenji U (2021) Text Mining (in Japanese). Asakura-shoten, Tokyo
Vessonen E (2021) Conceptual engineering and operationalism in psychology. Synthese 199(3):10615–10637
Article MathSciNet Google Scholar
Sengupta E, Garg D, Choudhury T, Aggarwal A (2018) Techniques to elimenate human bias in machine learning . In: 2018 International Conference on System Modeling & Advancement in Research Trends (SMART), pp. 226–230. IEEE
Vessonen E (2021) Respectful operationalism. Theory Psychol 31(1):84–105
Article Google Scholar
Zhou B, Bau D, Oliva A, Torralba A (2018) Interpreting deep visual representations via network dissection. IEEE Trans Pattern Anal Mach Intell 41(9):2131–2145
Article Google Scholar
Zhou B, Bau D, Oliva A, Torralba A (2019) Comparing the interpretability of deep networks via network dissection. Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Springer, New York, pp 243–252
Chapter Google Scholar
Langley P (2022) The computational gauntlet of human-like learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, 36, pp. 12268–12273
Tuli S, Dasgupta I, Grant E, Griffiths TL (2021) Are convolutional neural networks or transformers more like human vision? arXiv preprint arXiv:2105.07197
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 30
Rumelhart DE, McClelland JL (1986) On learning the past tenses of English verbs. In: Rumelhart DE, Hinton GE, Williams RJ (eds) Parallel Distributed Processing vol. 2. MIT Press, Cambrige, pp 216–271
Chapter Google Scholar
Lake BM, Ullman TD, Tenenbaum JB, Gershman SJ (2017) Building machines that learn and think like people. Behav Brain Sci 40:e253. Cambridge University Press
Chen X, Duan Y, Houthooft R, Schulman J, Sutskever I, Abbeel P (2016) Infogan: Interpretable representation learning by information maximizing generative adversarial nets. Adv Neural Inf Process Syst 29
Wang T, Zhao J, Yatskar M, Chang K-W, Ordonez V (2019) Balanced datasets are not enough: Estimating and mitigating gender bias in deep image representations . In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5310–5319
Wang L, Yan Y, He K, Wu Y, Xu W (2021) Dynamically disentangling social bias from task-oriented representations with adversarial attack . In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 3740–3750
Hall ET (1976) Beyond culture. Anchor, Garden city, NY
Google Scholar
Ilyas A, Santurkar S, Tsipras D, Engstrom L, Tran B, Madry A (2019) Adversarial examples are not bugs, they are features. Adv Neural Inf Process Syst 32
Capone L, Bertolaso M et al (2020) A philosophical approach for a human-centered explainable AI. In: XAI. It@ AI* IA ,pp. 80–86
Ozmen Garibay O, Winslow B, Andolina S, Antona M, Bodenschatz A, Coursaris C, Falco G, Fiore SM, Garibay I, Grieman K et al (2023) Six human-centered artificial intelligence grand challenges. Int J Human–Comput Interact 39(3):391–437

Download references

Acknowledgements

The author would like to thank Mitali Das for correcting and revising the paper.

Author information

Authors and Affiliations

Tokai University, Kitakaname, Hiratsuka, Kanagawa, 259-1292, Japan
Ryotaro Kamimura
Kumamoto Drone Technology and Development, Nishi-Ku, Kumamoto, 861-5289, Japan
Ryotaro Kamimura

Authors

Ryotaro Kamimura
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The author was responsible for all the conception or design of the work; or the acquisition, analysis, or interpretation of data; or the creation of new software used in the work.

Corresponding author

Correspondence to Ryotaro Kamimura.

Ethics declarations

Submission declaration and verification

This paper has been written based on the paper “Serially Disentangled Learning for Multi-Layered Neural Networks,” presented in the IEA/AIE 2022.

Data and material transparency

The author agrees to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Conflict of potential and competing interests

The author has no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1

1.1 Note on the parameter setting

For the practical computation, we modified the original complementary potentiality $\overline{r }$ with some parameters to eliminate several extreme values that we encountered in the middle of learning. The modified complementary potentiality is given by

$${\overline{r} }_{jk}^{(t,t+1)}={\left[1-\frac{{u}_{jk}^{(t,t+1)}}{\mathrm{m}a{x}_{j\mathrm{^{\prime}}k\mathrm{^{\prime}}} {u}_{j\mathrm{^{\prime}}k\mathrm{^{\prime}}}^{(t,t+1)}}+\varepsilon \right]}^{\gamma }.$$

(25)

These parameters were used only to stabilize learning by weakening the effects of individual potentiality. The parameter values $\varepsilon =0.01$ and $\gamma =0.05$ were set. With this modified complementary potentiality, weights were computed for the $(n+1)$ th learning step as follows:

$${w}_{jk}^{(t,t+1)}(n+1)={\theta }^{wgt}{\overline{r} }_{jk}^{(t,t+1)}(n) {w}_{jk}^{(t,t+1)}(n).$$

(26)

This computation should be followed by 50 learning epochs with conventional error minimization, independent of the above computational procedure. The parameter ${\theta }^{wgt}$ was set to a very large value of 1.8 for the forced method and 1.0 for all other cases. Additionally, the input, output, and layer neutrality were computed using the corresponding parameter $\theta =1.0$. However, only for the forced method, ${\theta }^{in}$ and ${\theta }^{out}$ were set to 0.9 to stabilize learning. Moreover, only for the forced method, ${\theta }^{lay}$ for the layer potentiality was set to 1.2 to effectively increase the strength of weights.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kamimura, R. Contradiction neutralization for interpreting multi-layered neural networks. Appl Intell 53, 28349–28376 (2023). https://doi.org/10.1007/s10489-023-04883-z

Download citation

Accepted: 11 July 2023
Published: 02 October 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s10489-023-04883-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Contradiction neutralization for interpreting multi-layered neural networks

Abstract

Access this article

Similar content being viewed by others

A Paraconsistent Artificial Neural Cell of Learning by Contradiction Extraction (PANCLCTX) with Application Examples

Investigating the Compositional Structure of Deep Neural Networks

Application of the Topological Gradient to Parsimonious Neural Networks

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Submission declaration and verification

Data and material transparency

Conflict of potential and competing interests

Additional information

Publisher's note

Appendix 1

1.1 Note on the parameter setting

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Contradiction neutralization for interpreting multi-layered neural networks

Abstract

Access this article

Similar content being viewed by others

A Paraconsistent Artificial Neural Cell of Learning by Contradiction Extraction (PANCLCTX) with Application Examples

Investigating the Compositional Structure of Deep Neural Networks

Application of the Topological Gradient to Parsimonious Neural Networks

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Submission declaration and verification

Data and material transparency

Conflict of potential and competing interests

Additional information

Publisher's note

Appendix 1

Appendix 1

1.1 Note on the parameter setting

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation