Abstract
The successes of Machine Learning, and in particular of Deep Learning systems, have led to a reformulation of the Artificial Intelligence agenda. One of the pressing issues in the field is the extraction of knowledge out of the behavior of those systems. In this paper we propose a semiotic analysis of that behavior, based on the formal model of learners. We analyze the topos-theoretic properties that ensure the logical expressivity of the knowledge embodied by learners. Furthermore, we show that there exists an ideal universal learner, able to interpret the knowledge gained about any possible function as well as about itself, which can be monotonically approximated by networks of increasing size.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
References
Atkin, A.: Peirce’s Theory of Signs, In: Zalta, E. (ed.) The Stanford Encyclopedia of Philosophy, (2013). https://plato.stanford.edu/archives/sum2013/entries/peirce-semiotics
Bauer, A., Lumsdaine, -, P.L.: On the Bourbaki-Witt Principle in Toposes. Math. Proc. Cambridge Philos. Soc. 155, 87–99 (2013)
Belfiore, J.C., Bennequin, D.: Topos and Stacks of Deep Neural Networks (2021). arXiv:2106.14587
Bommasani, R., et al.: On the Opportunities and Risks of Foundation Models, (2022). arXiv:2108.07258
Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H.: Sparks of Artificial General Intelligence: Early Experiments with GPT-4, (2023). arXiv:2303.12712
Davies, A., Veličkovič, P., Buesing, L., Blackwell, S., Zheng, D., TomaŠev, N., Tanburn, R., Battaglia, P., Blundell, C., Juhász, A., Lackenby, M., Williamson, G., Hassabis, D., Kohli, P.: Advancing Mathematics by Guiding Human Intuition with AI. Nature 600(7887), 70–74 (2021)
Ferruz, N., Zitnik, M., Oudeyer, P.Y., Hine, E., Sengupta, N., Shi, Y., Mincu, D., Porsdam Mann, S., Das, P., Stella, F.: Anniversary AI reflections. Nat. Mach. Intell. 6(1), 6–12 (2024)
Emmenegger, J., Pasquali, -, Rosolini, F.-, G.: A Characterisation of Elementary Fibrations. Ann. Pure Appl. Log. 173(6), 103103 (2022)
Fong, B., Spivak, D., Tuyéras, R.: Backprop as Functor: A Compositional Perspective on Supervised Learning. Proceedings of the 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), 1-13 (2019)
Fortnow, L.: Fifty Years of P vs. NP and the Possibility of the Impossible,. Commun. ACM 65, 76–85 (2022)
Gavranovic, B.: Meta-Learning and Monads, (2021). https://www.brunogavranovic.com/posts/2021-10-13-meta-learning-and-monads.html
Hedges, J.: From Open Learners to Open Games, (2019). arXiv:1902.08666
Hutter, M.: Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability, Springer Science and Business Media (2004)
Johnstone, P. T.: Sketches of an Elephant vol. 1 and 2, Oxford University Press (2002)
Kirchherr, W., Li, M., Vitányi, P.: The Miraculous Universal Distribution. Math. Intell. 19, 7–15 (1997)
Kelly, G.M.: Basic Concepts of Enriched Category Theory. Cambridge University Press (1982)
Lee, M.: A Mathematical Investigation of Hallucination and Creativity in GPT Models. Mathematics 11, 2320 (2023)
Lee, M.: A Mathematical Interpretation of Autoregressive Generative Pre-Trained Transformer and Self-Supervised Learning. Mathematics 11, 2451 (2023)
Lipman, B.L.: How to Decide How to Decide How to \(...\): Modeling Limited Rationality. Econometrica 59, 1105–1125 (1991)
MacLane, S., Moerdijk, I.: Sheaves in Geometry and Logic: a First Introduction to Topos Theory, Springer Science & Business Media (2012)
McLarty, C.: Elementary Categories. Clarendon Press, Elementary Toposes (1992)
Mitchell, M., Krakauer, D.C.: The Debate over Understanding in AI’s Large Language Models. Proc. Natl. Acad. Sci. 120(13), e2215907120 (2023)
Olah, C.: Neural Networks, Types, and Functional Programming, (2015). https://colah.github.io/posts/2015-09-NN-Types-FP/
Peirce, C.S. ed. by Bellucci, F.: Charles S. Peirce: Selected Writings on Semiotics 1894-1912, De Gruyter/Mouton (2020)
Priss, U.: Semiotic-Conceptual Analysis: a Proposal. Int. J. General Syst. 46(5), 569–585 (2017)
Priss, U.: A Semiotic Perspective on Polysemy. Ann. Math. Artif. Intell. 90(11–12), 1125–1138 (2022)
Rezk, C.: Toposes and Homotopy Toposes, (2010). https://faculty.math.illinois.edu/rezk/homotopy-topos-sketch.pdf
Schmidhuber, J.: Annotated History of Modern AI and Deep Learning, (2022). arXiv:2212.11279
Shiebler, D., Gavranović, B., Wilson, P.: Category Theory in Machine Learning, (2021). arXiv:2106.07032
Southwell, R., Gupta, N.: Categories and Toposes: Visualized and Explained, KDP Publishing (2021)
Spivak, D.I.: Category Theory for the Sciences. MIT Press (2014)
Spivak, D.I.: Learners’ Languages, (2021). arXiv:2103.01189
Stepin, I., Alonso, J.M., Catala, A., Pereira-Fariña, M.: A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence. IEEE Access 9, 11974–12001 (2021)
Vickers, P., Faith, J., Rossiter, N.: Understanding Visualization: A Formal Approach using Category Theory and Semiotics. IEEE Trans. Vis. Comput. Graph. 19, 1048–1061 (2012)
Yuan, Y.: On the Power of Foundation Models, Proceedings of the 40th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023, (2023)
Zimmermann, A.: Philosophers on GPT-3, Daily Nous (2020) https://dailynous.com/2020/07/30/philosophers-gpt-3/
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Proof of Proposition 1
We prove this only for the case of a complete category \(\mathcal {C}\). The case of a cocomplete category is analogous. Consider an indexing category \(\mathcal {I}\) and a functor \(F: \mathcal {I} \rightarrow \mathcal {C}^{\textbf{2}}\). This constitutes a diagram. The limit of diagram F is required to be an object \(\lim F\) in \(\mathcal {C}^{\textbf{2}}\) and morphisms \(\hat{f}_x: \lim F \rightarrow F(x)\) for each \(x \in \text{ Ob }(\mathcal {I})\) such that, if there exists another object \(\beta \) in \(\mathcal {C}^{\textbf{2}}\) with morphisms \(\hat{f}_x^{\prime }: \beta \rightarrow F(x)\) for each \(x \in \text{ Ob }(\mathcal {I})\) there exists a unique morphism \(!: \beta \rightarrow \lim F\) that makes everything commute.
Then, consider two projection functors
for \(n \in \text{ Ob }(\textbf{2})\). Given any diagram \(F: \mathcal {I} \rightarrow \mathcal {C}^{\textbf{2}}\) we obtain
a diagram in \(\mathcal {C}\). Since this category is complete, it exists a limit of this functor, \(\lim E_n \circ F\).
Then, a functor \(\alpha _F: \textbf{2} \rightarrow \mathcal {C}\) can be defined, such that \(\alpha _F(n) = \lim E_n \circ F\) for \(n = 0,1\) and
\(\alpha _F\) yields the limit of diagram F. \(\square \)
Proof of Proposition 2
A category \(\mathcal {C}\) is cartesian closed if it satisfies the following three properties: (i) it has a terminal object, (ii) for any pair \(x,y \in \text{ Ob }(\mathcal {C})\) their product \(x \times y\) exists in \(\mathcal {C}\), (iii) for any pair \(a,b \in \text{ Ob }(\mathcal {C})\) there exists their exponential \(a^b\) in \(\mathcal {C}\).
Conditions (i) and (ii) are satisfied in \(\mathcal {C}^{\textbf{2}}\), according to Proposition 1.
With respect to (iii), notice that we can define a bifunctor
where
and \(n, n' \in \text{ Ob }(\textbf{2})\), i.e. \(n, n' \in \{0,1\}\). Note that each \(\alpha (n')^{\beta (n')}\) is an exponential object in \(\mathcal {C}\), which exists since \(\mathcal {C}\) is a closed cartesian category.
Consider a family of wedges. Each wedge consists of \(\gamma \in \text{ Ob }(\mathcal {C}^{\textbf{2}})\) and morphisms
for every object \(n \in \text{ Ob }(\textbf{2})\), such that the following diagram commutes:
where \(f: n \rightarrow n'\), \(id_n: n \rightarrow n\) and \(id_{n'}: n' \rightarrow n'\) are morphisms in \(\textbf{2}\).
An end for \(\bar{\alpha }^{\bar{\beta }}\) is an universal wedge such that, for each \(n \in \text{ Ob }(\textbf{2})\) it is denoted
with projections \(p_{n'}\) for \(n'=0,1\) such that for any other object \(\gamma \in \text{ Ob }(\mathcal {C}^{\textbf{2}})\) and morphisms \(\{w_{n'}\}_{n' \in \text{ Ob }(\textbf{2})}\), there exists a unique morphism that makes the following diagram commutative:
To see that \(\alpha ^{\beta }\) is an exponential object consider the class of morphisms \(\text{ Hom}_{\mathcal {C}^{\textbf{2}}}(\gamma , \alpha ^{\beta })\) defined as
Since \(\mathcal {C}\) is cartesian closed, we have that
.
Thus, \(\alpha ^{\beta }\) is an exponential object in \(\mathcal {C}^{\textbf{2}}\). \(\square \)
Proof of Proposition 3
Consider an object \(\bar{\Omega }\) in \(\mathcal {C}\) such that:
obtained as the pullback of \(h: \Omega \times \Omega \rightarrow \Omega \) and \(T: 1 \rightarrow \Omega \) in \(\mathcal {C}\):
Given \(\varvec{\omega }: \bar{\Omega } \rightarrow \Omega \in \mathcal {C}^{\textbf{2}}\), we claim that there exists a monomorphism \(\top _{\mathcal {C}^{\textbf{2}}}: \text{ id}_{1} \hookrightarrow \varvec{\omega }\) (where \(\text{ id}_1\) is the identity arrow of the terminal object 1 in \(\mathcal {C}\)) such that for every monomorphism \(f: \mu \hookrightarrow \alpha \), there exists a unique morphism \(\chi _f: \alpha \rightarrow \varvec{\omega }\) that makes the following diagram commutative:
This is ensured if, on one hand, the following diagrams commute in the topos \(\mathcal {C}\):
for \(n = 0, 1 \in \text{ Ob }(\textbf{2})\). This follows trivially from the definition of monomorphisms in \(\mathcal {C}^{\textbf{2}}\). On the other hand, an additional requirement is that \(\mu (0) \hookrightarrow \alpha (0)\) factors through the pullback \(f(0,1): \mu (1) \times _{\alpha (1)} \alpha (0) \hookrightarrow \alpha (0)\). Informally, this means that \(\bar{\Omega }\) makes the following diagram commutative:
Thus, the subobject classifier in \(\mathcal {C}^{\textbf{2}}\) is \(\omega = \hat{\Omega }: \bar{\Omega } \rightarrow \Omega \), which lifts the subobject classifier \(\Omega \) in \(\mathcal {C}\). \(\square \)
Proof of Theorem 1
Trivial. If \(\mathcal {C}\) is a topos it is both complete and cocomplete, it has exponentials and a subject classifier. Thus, according to Propositions 1, 2 and 3, these properties are shared by \(\mathcal {C}^{\textbf{2}}\). \(\square \)
Proof of Proposition 4
Since \(\alpha : \textbf{2} \rightarrow \mathcal {C}\), according to the Yoneda lemma for \(\mathcal {C}\) we have that
for \(n= 0,1 \in \textbf{2}\), where \(F_n\) is the projection of F on the \(\mathcal {C}\) corresponding to the images of functors on n. Then we can define \(\text{ Hom}_{\mathcal {C}^{\textbf{2}}}\) and F satisfying the Yoneda condition for \(\mathcal {C}^{\textbf{2}}\). \(\square \)
Proof of Lemma 1
any category \(\mathcal {F}-\textbf{Coalg}\) is such that the objects are functions
and the morphisms are the obvious maps between functions \(h: S \rightarrow \mathcal {F}(S)\) and \(h^{'}: S^{'} \rightarrow \mathcal {F}(S^{'})\) that yield a sign in this context, that is, a commutative diagram
Then, given that,
we have that our \(\mathcal {F}\) is an endofunctor in \(\textbf{Set}\), defined for any set S as
According to Theorem 3.3 in [32] \(B^A A^{A \times B} S^{A \times B} - \textbf{Coalg}\) is a category of (co)presheaves on \(\textbf{Set}\). This indicates that \(\textbf{Learn}(A,B)\) is a topos. \(\square \)
Proof of Theorem 2
\(\textbf{Learn}\) is a category since it satisfies the following properties:
-
Composition of morphisms: given two learners, \(A \rightarrow B\) and \(B \rightarrow C\), defined as two equivalence classes \(\langle \bar{P}, \bar{I}, \bar{U}, \bar{r}\rangle \) and \(\langle \bar{P}^{'}, \bar{I}^{'}, \bar{U}^{'}, \bar{r}^{'} \rangle \), their composition is a learner \(A \rightarrow C\) consisting of the equivalence class with representative \(\langle P \times P^{'}, I *I^{'}, U *U^{'}, r *r^{'}\rangle \), where:
-
\(I *I^{'}((p, p^{'}),a) = I^{'}(p^{'}, I(p,a)) \in C\), where \(p \in P\), \(p^{'} \in P^{'}\), \(a \in A\) for \(P \in \bar{P}\) and \(P^{'} \in \bar{P}^{'}\).
-
\(U *U^{'}((p, p^{'}), a, c) = (U(p, a, r^{'}(p^{'}, I(p,a), c)), U^{'}(p^{'}, I(p,a), c)) \in P \times P^{\prime }\).
-
\(r *r^{'}((p, p^{'}), a, c) = r(p, a, r^{'}(p^{'}, I(p,a), c)) \in A\).
With this specification, the composition of learners is associative. That is, given learners \(\langle \bar{P}, \bar{I}, \bar{U}, \bar{r}^{'} \rangle : A \rightarrow B\), \(\langle \bar{P}^{'}, \bar{I}^{'}, \bar{U}^{'}, \bar{r}^{'} \rangle : B \rightarrow C\) and \(\langle \bar{P}^{''}, \bar{I}^{''}, \bar{U}^{''}, \bar{r}^{''} \rangle : C \rightarrow D\) we have:
-
\(I *[I^{'} *I^{''}]((p, (p^{'}, p^{''})) a)\) \(=\) \([I *I^{'}] *I^{''}(((p, p^{'}), p^{''}), b)\) \(=\) \(I^{''}(p^{''}, I^{'}(p^{'}, I(p,a)))\).
-
\(U *[U^{'} *U^{''}]((p, (p^{'}, p^{''})), a, d)\) \(=\) \([U *U^{'}] *U^{''}((p, p^{'}), p^{''}), b, c)\) \(=\) \((U(p,a, r^{'}(p^{'}, I(p,a),c)),\) \( U^{'}(p^{'}, I(p,a),\) \( r^{''}(p^{''}, I^{'}(p^{'},b), d)),\) \( U^{''}(p^{''}, I^{'}(p^{'},b), d))\).
-
\(r *[r^{'} *r^{''}]((p, (p^{'}, p^{''})), a, d)\) \(=\) \([r *r^{'}] *r^{''}(((p, p^{'}), p^{''}), b, d)\) \(=\) \(r(p, a, r^{'}(p^{'}, I(p,a), r^{''}(p^{''}, I^{'}(p^{'}, I(p,a)))))\).
-
-
Identity: given any \(A \in Ob(\textbf{Learn})\), the identity morphism is the equivalence class of learners with representative \(\langle \{*\}, I, U, r \rangle \), where \(I(*, a) = a\), \(U(*, a, a) = *\) and \(r(*, a, a) = a\) for each \(a \in A\).
\(\square \)
Proof of Corollary 1
Immediate by induction. \(Ar(\textbf{Learn}(A,B))\) is a topos, according to Proposition 5. Then, if \(Ar^{k}(\textbf{Learn}(A,B))\) is a topos, \(Ar^{k+1}(\textbf{Learn}(A,B))\) is, by Theorem 1, also a topos. \(\square \)
Proof of Proposition 6
\(\textbf{F}^{\leftarrow }(\bar{\alpha })\), by definition, returns a morphism between the domains of morphisms \(\gamma \) and \(\rho \). Since \(\gamma \) and \(\rho \) are \(B^A A^{A \times B} S^{A \times B}\)-coalgebras they are given by \(P_{\gamma } \rightarrow B^A A^{A \times B} P_{\gamma }^{A \times B}\) and \(P_{\rho } \rightarrow B^A A^{A \times B} P_{\rho }^{A \times B}\), respectively. Then, from \(\textbf{F}^{\leftarrow }(\bar{\alpha })\) we obtain \(\bar{f}: P_{\gamma } \rightarrow P_{\rho }\) that makes the following diagram commutative:
\(\square \)
Proof of Proposition 7
This follows immediately from Proposition 6, since \(\textbf{F}^{\leftarrow }(\gamma _1) = \mathbf {\mathcal {F}}^{\leftarrow \ n}(\gamma _{n})\), where \(\gamma _1: \gamma _0 \rightarrow \gamma ^{\prime }_0\). \(\square \)
Proof of Proposition 8
Immediate from Corollary 4.3 in [2], which indicates that a progressive map \(\mathbf {\mathcal {G}}: \mathcal {L} \rightarrow \mathcal {L}\) on a cocomplete topos has a fixed point.
\(\square \)
Proof of Proposition 9
Given any sequence \(\langle \gamma ^*_0, \gamma ^*_1, \ldots , \gamma ^*_n, \ldots , \gamma ^{*}_{n^*}\rangle \in L^{*}_{(A:B)}[n^*]\), the parameter space \(\hat{P}^{n^*}\) corresponds to the coproduct of the class of domains, codomains and functions between them, \(\{(A, B, f)\}\). Thus, for each sequence in \( L^{*}_{(A:B)}[n]\) for \(n \le n^{*}\) , the resulting \(P^n\) can correspond to coproduct of a subset of \(\{(A, B, f)\}\). \(\square \)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tohmé, F., Gangle, R. & Caterina, G. A category theory approach to the semiotics of machine learning. Ann Math Artif Intell 92, 733–751 (2024). https://doi.org/10.1007/s10472-024-09932-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-024-09932-y