Abstract
This paper attempts to describe and address a specific puzzle related to compositionality in artificial networks such as Deep Neural Networks and machine learning in general. The puzzle identified here touches on a larger debate in Artificial Intelligence related to epistemic opacity but specifically focuses on computational applications of human level linguistic abilities or properties and a special difficulty with relation to these. Thus, the resulting issue is both general and unique. A partial solution is suggested.
Similar content being viewed by others
Notes
Janssen (2012) argues that Frege was not the source (nor an adherent) of the PoC. In fact, he argues that Frege subscribed to a quite different principle for natural language semantics. Its true origins can actually be traced further back than Frege to Lotze, Wundt and Trendelenburg, according to Janssen. Hodges (2012) goes further to trace the concept to the works of the tenth century Arab scholar Al-F\(\overline{a}\)r\(\overline{a}\)b\(\overline{i}\) who could have in turn found it in 3rd century commentaries on Aristotle.
Propositional logic is a good example of a formal language with a simple compositional semantics. The meaning of a formula is a truth value and the meaning of a complex formula is a function of the meanings/truth values of its components. Predicate logic is not as simple a matter. Following Pratt (1979), we know that “there is no function such that meaning of \(\forall x\phi\) can be specified with a constraint of the form \(\mathcal {M}(\forall x\phi )=F(\mathcal {M}(\phi ))\)” (Janssen 1997: 498). In other words, the meaning of a universally quantified formula is not straightforwardly given in terms of a function from the meaning of its parts, at least not by means of the standard Tarskian interpretation.
Parallelism has other weaknesses though. For instance, it strongly suggests a building metaphor of a step-by-step procedure mapping syntactic combination with semantic interpretation. Possible world semantics does not respect this constraint, nor do semantic formalisms with intermediary representations like Montague’s Type 2.
I neglected to give an interpretation of what is meant by “syntactic rule” here. This is a matter of theoretical perspective to a large extent. Traditionally, categorial grammars have been used as well as phrase structure grammars. However, the options are without obvious limit.
However, the precise definition of word-hood assumed from isolating languages such as English and Chinese is not generalisable to agglutinating languages such as Turkish, Yupik and Nguni languages, partly due to the vague lines between morphology and syntax in these latter families. See Nefdt (2019) for a philosophical view on the difficulty of defining words and Haspelmath (2011) for a linguistic discussion.
In some literature, parthood is defined as a partial ordering, i.e. reflexive, antisymmetric and transitive. This allows a part to be a part of itself which when viewed from the point of view of set theory seems to invite inconsistencies.
Inferentialism’s “top-down” notion of compositionality might not naturally dovetail with some of the remarks made here. They tend to take the sentence as the primary unit of meaning and derive subsentential semantic value from there. Specifically, Brandom’s account sees language as recursively structured but doesn’t see meaning as compositional. See Brandom (2007) for more. I thank Bernard Weiss for this observation.
Consider Jabberwocky sentences or Chomsky’s Colorless Green Ideas Sleep Furiously, even in the absence of knowing what the meaning is, we can still identify what the meaningful parts are (or should be).
I thank an anonymous reviewer for drawing my attention to this possibility and guiding me to seek out more general examples.
I thank an anonymous reviewer for pressing me on this point.
It might help to think of storage here. Words might be stored as units and brought up or recalled during composition independently of their internal structures. According to Baggio et al. (2012: 656) “psychologically speaking, the real issue is about ‘the balance between storage and computation’, and the role compositionality plays there”. Martin and Baggio (2019: 1) even suggest that “human behaviour, including language use and linguistic data, indicates that composing parts into complex structures does not threaten the existence of constituent parts as independent units in the system: parts and wholes exist simultaneously yet independently from one another in the mind and brain.”
There is a tendency in the classical connectionist and current machine learning literature to take compositionality to only involve a recursive relationship between primitive and compound types of some kind (van Gelder 1990, 1994; Baroni 2019). The ways in which this abstract procedure is instantiated are then the particular types of compositionality which are implemented. I think these kinds of definitions run the risk of confusing semantic compositionality with computability and/or combinatoriality. One major difference between the latter concepts and the former is that they can operate on pure strings or syntax without semantic representation. Some experiments in machine learning adopt this confusion and test for compositionality on nonce words or ungrammatical strings. However, the PoC is a semantic principle which is essentially bound up in the syntax-semantics interface and discussions which neglect this aspect can therefore fail to capture its nature.
For comparisons between AlphaGo and Deep Blue of the previous AI generation, see Schubbach (2019).
Take Schelling’s famous model of segregation. With a minor preference function (30% satisfaction) and two kinds of agents distributed randomly in a population, a macro-level segregation effect is produced. But this equilibrium is explicable in terms of features of the simulation despite the effect only showing itself after a few generations have been run.
I thank Eduoard Machery for pointing this worry out to me.
Humphrey’s does go on to define essentially epistemic opacity or “a process is essentially epistemically opaque to X if and only if it is impossible, given the nature of X, for X to know all of the epistemically relevant elements of the process” (2009: 650). It is unclear what is meant exactly by “epistemically relevant elements” here. Durán and Formanek (2018) interpret it in terms of some sort of surveyability of steps in finite time. Nevertheless, one worries about the historical applicability of some such definition in times before a particular scientific advance. Surely relativity might have seemed epistemically opaque to Newtonians? The definition assumes we have a clear grasp of the limits of our natures and knowledge.
Weisberg (2007) calls this modelling technique “multiple models idealization”.
Again see Duran and Formanek (2018) for a computational version of reliabilism as a tool to capture surveyability and epistemic access in the service of grounding trust in complex systems.
Ananny and Crawford (2016) question the ideal of transparency in computational systems itself. They discuss a number of issues with the ideal and conclude that a larger “sociotechnical” appreciation of the interaction between machines and humans is necessary in order to reconstruct the notion of accountability in computational settings. Robbins (2019) also questions transparency but offers “envelopment” of AI systems as an approach to their uncertainty or opacity, in which we contain or limit their impact on and potential harm to humans.
Technically, dynamics or updates should not preclude the possibility of transparency. Dynamic semantics based as it is on dynamic logic is not epistemically opaque in any sense relevant here and although static concepts of meaning are jettisoned for context change potentials or updates, meaningful parts are clearly identifiable. See Groenendjik and Stokhof (1990) and Veltman (1991) for clear descriptions of the general framework.
Sullivan interprets this situation as one of “link uncertainty” in which understanding the intricacies of model is not paramount but rather the epistemic opacity is generated by a lack of understanding the link between the model and target phenomenon.
Many ethical discussions have centred around the possibility or necessity of “opening the black-boxes” or the “right to explanation” (such as the EU’s General Data Protection Regulation legislation). These discussions are of course beyond the present scope but see Robbins (2019) for an alternative approach to the ethical issues around black boxes in AI.
Similarly, for the salience based methods of describing image classifier tasks discussed in Ribeiro et al. (2016).
Of course, compositionality could apply in the visual domain similarly. The argument could go as follows: people seem to interpret visual stimuli they have never encountered before and they do so in a systematic way; the best explanation is that they accomplish this by relying on the smallest interpretable parts of the stimuli and the way those parts are combined. So, visual interpretation must be compositional. I thank Zoltán Szabó for this observation.
More direct approaches to identifying structure in networks do exist. One famous example is Smolensky’s (1990) tensor product representations which aimed at capturing variable binding and symbolic processing while remaining true to the neural net architecture of classical connectionism. See McCoy et al. (2019) for a more recent adaptation of this idea on RNNs.
I thank an anonymous reviewer for pointing me in the direction of this research.
References
Ananny, M., & Crawford, K. (2016). Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability. New Media & Society, 20(3), 973–989.
Andreas, J. (2019). Measuring compositionality in representation learning. ICLR.
Baggio, G., van Lambalgen, M., & Hagoort, P. (2012). The processing consequences of compositionality. In M. Werning, W. Hinzen, & E. Machery (Eds.), The Oxford handbook of compositionality (pp. 655–672). Oxford: Oxford University Press.
Barker, C., & Jacobson, P. (Eds.). (2007). Direct compositionality. Oxford: Oxford University Press.
Baroni, M. (2019). Linguistic generalization and compositionality in modern artificial neural networks. Retrieved from ArXiv preprint arXiv:1904.00157, to appear in the Philosophical Transactions of the Royal Society B.
Bastings, J., Aziz, W., & Titov, I. (2019). Interpretable neural predictions with differentiable binary variables. Retrieved from arXiv:1905.08160.pdf.
Blutner, R., Hendriks, P., De Hoop, H., & Schwartz, O. (2004). When compositionality fails to predict systematicity. In S. D. Levy, & R. Gayler (eds.), Compositional connectionism in cognitive science. papers from the AAAI fall symposium (pp. 6–11). Arlington: The AAAI Press.
Brandom, R. (1994). Making it explicit. Harvard: Harvard University Press.
Brandom, R. (2007). Inferentialism and some of its challenges. Philosophy and Phenomenological Research, 74(3), 651–676.
Chomsky, N. (1982). Some concepts and consequences of the theory of government and binding. Cambridge: MIT Press.
Cooper, R. (1975). Montague’s semantic theory and transformational syntax. Ph.D. Thesis, University of Massachusetts, Amherst.
Croft, W. (2001). Radical construction grammar. Oxford: Oxford University Press.
Davidson, D. (1967). Inquiries into truth and interpretation: Philosophical essays. Oxford: Oxford Clarendon Press.
Dever, J. (1999). Compositionality as methodology. Linguistics and Philosophy, 22(3), 311–326.
Dever, J. (2012). Compositionality. In The Routledge handbook to the philosophy of language (pp. 91–102).
Dowty, D. (1979). Word meaning and montague grammar: The semantics of verbs and times in generative semantics and in Montague’s PTQ. Dordrecht: Reidel.
Dowty, D. (2007). Compositionality as an empirical problem. In C. Barker & P. Jacobson (Eds.), Direct compositionality (pp. 23–101). Oxford: Oxford University Press.
Durán, J., & Formanek, N. (2018). Grounds for trust: Essential epistemic opacity and computational reliabilism. Minds and Machines, 28, 645–666.
Elman, J. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7, 195–225.
Evans, G. (1981). Semantic theory and tacit knowledge. Collected papers (pp. 322–342). Oxford: Clarendon Press.
Fodor, J., & Pylyshyn, Z. (1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1–2), 3–71.
Frege, G. (1908). Über Sinn und Bedeutung. Zeitschrift fir Philosophie und philosophische Kritik 100 (1892) 25–50; translated as ’On Sense and Reference’ in P. T. Geach and M. Black, Translations from the Philosophical Writings of Gottlob Frege, Blackwell, Oxford, 1960.
Frege, G. (1919). Notes for Ludwig Darmstaedter (Logik in der Mathematik), in Frege 1979: 253–257.
Frigg, R., & Reiss, J. (2009). The philosophy of simulation: Hot new issues or same old stew? Synthese, 169, 593–613.
Fodor, J. (1983). The modularity of mind. Cambridge: MIT Press.
Goldberg, A. (2015). Compositionality. In N. Reimer (Ed.), The Routledge handbook of semantics (pp. 419–433). London: Routledge.
Goldberg, Y. (2017). Neural network methods for natural language processing. San Francisco: Morgan & Claypool.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. Cambridge: MIT Press.
Groenendijk, J., & Stokhof, M. (1990). Dynamic Montague grammar. In L. Kalman & L. Polos (Eds.), Papers from the second symposium on logic and language (pp. 3–48). Akademiai Kiadoo: Budapest.
Groenendijk, J., & Stokhof, M. (2005). Why compositionality? In G. Carlson & J. Pelletier (Eds.), Reference and quantification: The partee effect (pp. 83–106). Stanford: CSLI Press.
Gulordava, K., Bojanowski, P., Grave, E., Linzen, T. & Baroni, M. (2018). Colorless green recurrent networks dream hierarchically. In Proceedings of NAACL, pp 1195–1205, New Orleans, LA.
Haspelmath, M. (2011). The indeterminacy of word segmentation and the nature of morphology and syntax. Folia Linguistica, 45(1), 31–80.
Heim, I., & Kratzer, A. (1998). Semantics in generative grammar. Oxford: Blackwell.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Hodges, W. (2012). Formalizing the relationship between meaning and syntax. In M. Werning, W. Hinzen, & E. Machery (Eds.), The oxford handbook of compositionality (pp. 245–261). Oxford: Oxford University Press.
Humphreys, P. (2009). The philosophical novelty of computer simulation methods. Synthese, 169, 615–626.
Hupkes, D., Dankers, V., Mul, M., Bruni, E. (2019). The compositionality of neural networks: Integrating symbolism and connectionism. Retrieved from arXiv:1908.08351.
Jackendoff, R. (1990). Semantic structures. Cambridge: MIT Press.
Jackendoff, R. (2002). The foundations of language: Brain, meaning, grammar, evolution. Oxford: Oxford University Press.
Jacobson, P. (2002). The (dis)organization of the grammar: 25 years. Linguistics and Philosophy, 25, 601–26.
Jacobson, R. (1958/1984). Morphological observations on Slavic declension (the structure of Russian case forms). In L. R. Waugh & M. Halle (eds.), Roman Jakobson. Russian and Slavic grammar: Studies 1931–1981 (pp. 105–133). Berlin: Mouton de Gruyter.
Janssen, T. (1997). Compositionality. In J. van Benthem & A. ter Meulen (Eds.), Handbook of logic and language (pp. 417–473). Amsterdam: Elsevier Science.
Janssen, T. (2012). Compositionality: Its historic context. In M. Werning, W. Hinzen & E. Machery (eds.) (pp. 19–46).
Johnson, K. (2004). On the systematicity of language and thought. Journal of Philosophy, 101, 111–139.
Johnson, K. (2015). Notational variants and invariance in linguistics. Mind and Language, 30(2), 162–186.
Kay, P., & Michaelis, L. (2011). Constructional meaning and compositionality. In C. Maienborn, K. von Heusinger, & P. Portner (Eds.), Semantics: An international handbook of natural language meaning. Berlin: Mouton de Gruyter.
Knight, W. (2017). The dark secret at the heart of AI. MIT Technology Review. Retrieved from https://www.technologyreview.com/s/604087/the-dark-secret-at-theheart-of-ai/.
Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems (pp. 1097–1105).
Lake, B., & Baroni, M. (2018). Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In Proceedings of ICML, pp 2879–2888, Stockholm, Sweden.
Lappin, S. & Zadrozny, W. (2000). Compositionality, synonymy, and the systematic representation of meaning. arXiv:cs/0001006.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436–444.
Lei, T., Barzilay, R., & Jaakkola, T. (2016). Rationalizing neural predictions. In Proceedings of the 2016 conference on empirical methods in natural language processing. Association for Computational Linguistics.
Lenhard, J., & Winsberg, E. (2010). Holism, entrenchment, and the future of climate model pluralism. Studies in History and Philosophy of Modern Physics, 41, 253–262.
Leśniewski, S. (1916). Podstawy ogólnej teoryi mnogości. I, Moskow: Prace Polskiego Kola Naukowego w Moskwie, Sekcya matematyczno-przyrodnicza; Eng. trans. by D. I. Barnett: ‘Foundations of the General Theory of Sets. I’, in S. Leśniewski, Collected Works (ed. by S. J. Surma et al.), Dordrecht: Kluwer, 1992, vol. 1, (pp. 129–173).
Liang, P., & Potts, C. (2015). Bringing machine learning and compositional semantics together. Annual Reviews of Linguistics, 1(1), 355–376.
Marcus, G. (2003). The algebraic mind. Cambridge: MIT Press.
Marcus, G. (2018). Deep learning: A critical appraisal. Retrieved from arXiv:1801.00631.
Marr, D. (1982). Vision. New York: W.H. Freeman and Company.
Martins, A., & Baggio, G. (2019). Modelling meaning composition from formalism to mechanism. Philosophical Transactions of the Royal Society B 375.
McCoy, T., Linzen, T., Dunbar, E., & Smolensky, P. (2019). RNNs implicitly implement tensor product representations. ICLR.
Meyes, R., Lu, M., de Puiseau, C. W., & Meisen, T. (2019). Ablation studies in artificial neural networks. CoRR. Retrieved from arXiv:abs/1901.08644.
Miotto, R., Li, L., Kidd, B. A., & Dudley, J. T. (2016). Deep patient: An unsupervised representation to predict the future of patients from the electronic health records. Scientific Reports, 6(26094), 1–10.
Montague, R. (1974). The proper treatment of quantification in ordinary English. Approaches to natural language (pp. 221–242). Dordrecht: Springer.
Morgan, J. (1969). On arguing about semantics. Papers in Linguistics, 1, 49–70.
Müller, V. (2019). Ethics of AI and robotics. In E. Zalta (Ed.), Stanford encyclopedia of philosophy. Palo Alto: CSLI, Stanford University.
Nefdt, R. (2019). The ontology of words: A structural approach. Inquiry, 62(8), 877–911.
Newman, J. (2016). Epistemic opacity, confirmation holism and technical debt: Computer simulation in the light of empirical software engineering. In F. Gadducci & M. Tavosanis (Eds.), History and philosophy of computing—third international conference, HaPoC 2015, Pisa, Italy, October 8–11, 2015, Revised Selected Papers (pp. 256–272). Dordrecht: Springer.
Pagin, P., & Westerstahl, D. (2010). Compositionality I: Definitions and variants. Philosophy Compass, 5(3), 250–264.
Partee, B. (2004). Compositionality in formal semantics. Oxford: Blackwell.
Pelletier, J. (2012). Holism and compositionality. In M. Werning, W. Hinzen & E. Machery (eds.) (pp. 149–174).
Pietroski, P. (2018). Conjoining meanings: Semantics without truth values. Oxford: Oxford University Press.
Pinker, S. (1984). Language learnability and language development. Cambridge: Harvard University Press.
Plebe, A., & Grasso, G. (2019). The unbearable shallow understanding of deep learning. Minds and Machines, 29, 515–553.
Pratt, V. R. (1979). Models of program logics. In 20th Annual Symposium on Foundations of Computer Science (sfcs 1979), San Juan, Puerto Rico, USA, pp. 115–122.
Pustejovsky, J. (1995). The generative lexicon. Cambridge: The MIT Press.
Pylyshyn, Z. (1984). Computation and cognition. Cambridge: MIT Press.
Rambow, O., & Joshi, A. (1992). A formal look at dependency grammars and phrase structure grammars, with special consideration of word-order phenomena. In International workshop on the meaning-text theory. Darmstadt. Arbeitspapiere der GMD, 671, 47–66.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why should I Trust You? Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135–1144).
Robbins, S. (2019). AI and the path to envelopment: Knowledge as a first step towards the responsible regulation and use of AI-powered machines. AI & Society,. https://doi.org/10.1007/s00146-019-00891-1.
Rumelhart, D., McClelland, J., & Research Group, P. D. P. (Eds.). (1986). Parallel distributed processing: Explorations in the microstructure of cognition: Foundations (Vol. 1). Cambridge: MIT Press.
Schubbach, A. (2019). Judging machines: Philosophical aspects of deep learning. Synthese (online first).
Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., van den Driessche, G., et al. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484–489.
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., et al. (2017a). Mastering chess and Shogi by self-play with a general reinforcement learning algorithm. Retrieved from arXiv preprint arXiv:1712.01815.
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Arthur Guez, A., et al. (2017b). Mastering the game of go without human knowledge. Nature, 550, 354–359.
Smolensky, P. (1990). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence, 46(1–2), 159–216.
Stöckler, M. (2000). On modelling and simulations as instruments for the study of complex systems. In M. Carrier (Ed.), Science at century’s end: Philosophical questions on the progress and limits of science. Pittsburgh: University of Pittsburgh Press.
Sullivan, E. (2019). Understanding from machine learning models. British Journal of the Philosophy of Science. (forthcoming).
Sutskever, I., Vinyals, O., & Le, Q. (2014). Sequence to sequence learning with neural networks. In Proceedings of NIPS (pp. 3104–3112). Montreal, Canada.
Szabó, Z. (2000). The Problem of compositionality. Abingdon: Routledge Press.
Szabó, Z. (2007). Compositionality. In E. Zalta, (ed.), The Stanford Encyclopedia of Philosophy (Spring 2007 Edition). Retrieved from http://plato.stanford.edu/archives/spr2007/entries/compositionality/.
Szabó, Z. (2012). The case for compositionality. In M. Werning, W. Hinzen & E. Machery (eds.) (pp. 64–80).
Tarski, A. (1933). The concept of truth in the languages of the deductive sciences. Reprinted in Zygmunt 1995 (pp. 13–172); expanded English translation in Tarski 1983 [1956] (pp. 152–278).
van Gelder, T. (1990). Compositionality: A connectionist variation on a classical theme. Cognitive Science, 14, 355–384.
van Gelder, T. J., & Port, R. (1994). Beyond symbolic: Towards a Kama-Sutra of compositionality. In V. Honavar & L. Uhr (Eds.), Artificial intelligence and neural networks: Steps toward principled integration (p. 1071–25). San Diego: Academic Press.
Veltman, F. (1991). Defaults in update semantics. In Hans Kamp (Ed.), Conditionals, defaults and belief revision. Dyana Deliverable R2.5A: Edinburgh.
Weisberg, M. (2007). Three kinds of idealization. The Journal of Philosophy, 104(12), 639–659.
Werning, M. (2005). Right and wrong reasons for compositionality. In M. Werning (Ed.), The Compositionality of Meaning and Content (vol. 1, Foundational Issues, pp. 285–309). Frankfurt: Ontos Verlag.
Werning, M. (2012). Non-symbolic compositional representation and its neuronal foundation: Towards an emulative semantics. In M. Werning, W. Hinzen, & E. Machery (Eds.), The Oxford handbook of compositionality (pp. 633–654). Oxford: Oxford University Press.
Wittgenstein, L. (1953). Philosophical investigations. In G. Anscombe & R. Rhees (Eds.), G.E.M. Anscombe (trans.). Oxford: Blackwell.
Yu, M., Chang, S., & Jaakkola. T. (2019). Learning corresponded rationales for text matching, 2019. Retrieved from https://openreview.net/forum?id=rklQas09tm.
Acknowledgements
I would like to thank Kyle Blumberg, Eduoard Machery, Zoltán Szabó, Bernhard Weiss and two anonymous reviewers of this journal for their detailed and useful comments on the content. I would also like to thank the organisers and fellow participants of the Compositionality in Brains and Machines workshop held at the Lorentz Center at the University of Leiden in August 2019, especially Marco Baroni, Dieuwke Hupkes, and Jelle Zuidema and the staff of the Lorentz Center itself. Lastly, I would like to thank fellow participants at the Language, Concepts, and Science workshop held at the University of Johannesburg in October 2019.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Nefdt, R.M. A Puzzle concerning Compositionality in Machines. Minds & Machines 30, 47–75 (2020). https://doi.org/10.1007/s11023-020-09519-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11023-020-09519-6