Skip to main content
Log in

A category theory approach to the semiotics of machine learning

  • Published:
Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Abstract

The successes of Machine Learning, and in particular of Deep Learning systems, have led to a reformulation of the Artificial Intelligence agenda. One of the pressing issues in the field is the extraction of knowledge out of the behavior of those systems. In this paper we propose a semiotic analysis of that behavior, based on the formal model of learners. We analyze the topos-theoretic properties that ensure the logical expressivity of the knowledge embodied by learners. Furthermore, we show that there exists an ideal universal learner, able to interpret the knowledge gained about any possible function as well as about itself, which can be monotonically approximated by networks of increasing size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

References

  1. Atkin, A.: Peirce’s Theory of Signs, In: Zalta, E. (ed.) The Stanford Encyclopedia of Philosophy, (2013). https://plato.stanford.edu/archives/sum2013/entries/peirce-semiotics

  2. Bauer, A., Lumsdaine, -, P.L.: On the Bourbaki-Witt Principle in Toposes. Math. Proc. Cambridge Philos. Soc. 155, 87–99 (2013)

  3. Belfiore, J.C., Bennequin, D.: Topos and Stacks of Deep Neural Networks (2021). arXiv:2106.14587

  4. Bommasani, R., et al.: On the Opportunities and Risks of Foundation Models, (2022). arXiv:2108.07258

  5. Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y.T., Li, Y., Lundberg, S., Nori, H.: Sparks of Artificial General Intelligence: Early Experiments with GPT-4, (2023). arXiv:2303.12712

  6. Davies, A., Veličkovič, P., Buesing, L., Blackwell, S., Zheng, D., TomaŠev, N., Tanburn, R., Battaglia, P., Blundell, C., Juhász, A., Lackenby, M., Williamson, G., Hassabis, D., Kohli, P.: Advancing Mathematics by Guiding Human Intuition with AI. Nature 600(7887), 70–74 (2021)

    Article  Google Scholar 

  7. Ferruz, N., Zitnik, M., Oudeyer, P.Y., Hine, E., Sengupta, N., Shi, Y., Mincu, D., Porsdam Mann, S., Das, P., Stella, F.: Anniversary AI reflections. Nat. Mach. Intell. 6(1), 6–12 (2024)

    Article  Google Scholar 

  8. Emmenegger, J., Pasquali, -, Rosolini, F.-, G.: A Characterisation of Elementary Fibrations. Ann. Pure Appl. Log. 173(6), 103103 (2022)

  9. Fong, B., Spivak, D., Tuyéras, R.: Backprop as Functor: A Compositional Perspective on Supervised Learning. Proceedings of the 34th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), 1-13 (2019)

  10. Fortnow, L.: Fifty Years of P vs. NP and the Possibility of the Impossible,. Commun. ACM 65, 76–85 (2022)

    Article  Google Scholar 

  11. Gavranovic, B.: Meta-Learning and Monads, (2021). https://www.brunogavranovic.com/posts/2021-10-13-meta-learning-and-monads.html

  12. Hedges, J.: From Open Learners to Open Games, (2019). arXiv:1902.08666

  13. Hutter, M.: Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability, Springer Science and Business Media (2004)

  14. Johnstone, P. T.: Sketches of an Elephant vol. 1 and 2, Oxford University Press (2002)

  15. Kirchherr, W., Li, M., Vitányi, P.: The Miraculous Universal Distribution. Math. Intell. 19, 7–15 (1997)

    Article  MathSciNet  Google Scholar 

  16. Kelly, G.M.: Basic Concepts of Enriched Category Theory. Cambridge University Press (1982)

    Google Scholar 

  17. Lee, M.: A Mathematical Investigation of Hallucination and Creativity in GPT Models. Mathematics 11, 2320 (2023)

    Article  Google Scholar 

  18. Lee, M.: A Mathematical Interpretation of Autoregressive Generative Pre-Trained Transformer and Self-Supervised Learning. Mathematics 11, 2451 (2023)

    Article  Google Scholar 

  19. Lipman, B.L.: How to Decide How to Decide How to \(...\): Modeling Limited Rationality. Econometrica 59, 1105–1125 (1991)

    Article  MathSciNet  Google Scholar 

  20. MacLane, S., Moerdijk, I.: Sheaves in Geometry and Logic: a First Introduction to Topos Theory, Springer Science & Business Media (2012)

  21. McLarty, C.: Elementary Categories. Clarendon Press, Elementary Toposes (1992)

    Book  Google Scholar 

  22. Mitchell, M., Krakauer, D.C.: The Debate over Understanding in AI’s Large Language Models. Proc. Natl. Acad. Sci. 120(13), e2215907120 (2023)

    Article  Google Scholar 

  23. Olah, C.: Neural Networks, Types, and Functional Programming, (2015). https://colah.github.io/posts/2015-09-NN-Types-FP/

  24. Peirce, C.S. ed. by Bellucci, F.: Charles S. Peirce: Selected Writings on Semiotics 1894-1912, De Gruyter/Mouton (2020)

  25. Priss, U.: Semiotic-Conceptual Analysis: a Proposal. Int. J. General Syst. 46(5), 569–585 (2017)

    Article  MathSciNet  Google Scholar 

  26. Priss, U.: A Semiotic Perspective on Polysemy. Ann. Math. Artif. Intell. 90(11–12), 1125–1138 (2022)

    Article  MathSciNet  Google Scholar 

  27. Rezk, C.: Toposes and Homotopy Toposes, (2010). https://faculty.math.illinois.edu/rezk/homotopy-topos-sketch.pdf

  28. Schmidhuber, J.: Annotated History of Modern AI and Deep Learning, (2022). arXiv:2212.11279

  29. Shiebler, D., Gavranović, B., Wilson, P.: Category Theory in Machine Learning, (2021). arXiv:2106.07032

  30. Southwell, R., Gupta, N.: Categories and Toposes: Visualized and Explained, KDP Publishing (2021)

  31. Spivak, D.I.: Category Theory for the Sciences. MIT Press (2014)

    Google Scholar 

  32. Spivak, D.I.: Learners’ Languages, (2021). arXiv:2103.01189

  33. Stepin, I., Alonso, J.M., Catala, A., Pereira-Fariña, M.: A Survey of Contrastive and Counterfactual Explanation Generation Methods for Explainable Artificial Intelligence. IEEE Access 9, 11974–12001 (2021)

    Article  Google Scholar 

  34. Vickers, P., Faith, J., Rossiter, N.: Understanding Visualization: A Formal Approach using Category Theory and Semiotics. IEEE Trans. Vis. Comput. Graph. 19, 1048–1061 (2012)

    Article  Google Scholar 

  35. Yuan, Y.: On the Power of Foundation Models, Proceedings of the 40th International Conference on Machine Learning, Honolulu, Hawaii, USA. PMLR 202, 2023, (2023)

  36. Zimmermann, A.: Philosophers on GPT-3, Daily Nous (2020) https://dailynous.com/2020/07/30/philosophers-gpt-3/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fernando Tohmé.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Proof of Proposition 1

We prove this only for the case of a complete category \(\mathcal {C}\). The case of a cocomplete category is analogous. Consider an indexing category \(\mathcal {I}\) and a functor \(F: \mathcal {I} \rightarrow \mathcal {C}^{\textbf{2}}\). This constitutes a diagram. The limit of diagram F is required to be an object \(\lim F\) in \(\mathcal {C}^{\textbf{2}}\) and morphisms \(\hat{f}_x: \lim F \rightarrow F(x)\) for each \(x \in \text{ Ob }(\mathcal {I})\) such that, if there exists another object \(\beta \) in \(\mathcal {C}^{\textbf{2}}\) with morphisms \(\hat{f}_x^{\prime }: \beta \rightarrow F(x)\) for each \(x \in \text{ Ob }(\mathcal {I})\) there exists a unique morphism \(!: \beta \rightarrow \lim F\) that makes everything commute.

Then, consider two projection functors

$$ E_n : \mathcal {C}^{\textbf{2}} \rightarrow \mathcal {C} $$

for \(n \in \text{ Ob }(\textbf{2})\). Given any diagram \(F: \mathcal {I} \rightarrow \mathcal {C}^{\textbf{2}}\) we obtain

$$ E_n \circ F: \mathcal {I} \rightarrow \mathcal {C}$$

a diagram in \(\mathcal {C}\). Since this category is complete, it exists a limit of this functor, \(\lim E_n \circ F\).

Then, a functor \(\alpha _F: \textbf{2} \rightarrow \mathcal {C}\) can be defined, such that \(\alpha _F(n) = \lim E_n \circ F\) for \(n = 0,1\) and

$$\alpha _F(i_{01}) = f_{F}: (\lim E_0 \circ F) \rightarrow (\lim E_1 \circ F)$$

\(\alpha _F\) yields the limit of diagram F. \(\square \)

Proof of Proposition 2

A category \(\mathcal {C}\) is cartesian closed if it satisfies the following three properties: (i) it has a terminal object, (ii) for any pair \(x,y \in \text{ Ob }(\mathcal {C})\) their product \(x \times y\) exists in \(\mathcal {C}\), (iii) for any pair \(a,b \in \text{ Ob }(\mathcal {C})\) there exists their exponential \(a^b\) in \(\mathcal {C}\).

Conditions (i) and (ii) are satisfied in \(\mathcal {C}^{\textbf{2}}\), according to Proposition 1.

With respect to (iii), notice that we can define a bifunctor

$$\bar{\alpha }^{\bar{\beta }}: \textbf{2}^{\text{ op }} \times \textbf{2} \rightarrow \mathcal {C}$$

where

$$\bar{\alpha }^{\bar{\beta }}(n,n') \ = \ \Pi _{n \rightarrow n'} \alpha (n')^{\beta (n')}$$

and \(n, n' \in \text{ Ob }(\textbf{2})\), i.e. \(n, n' \in \{0,1\}\). Note that each \(\alpha (n')^{\beta (n')}\) is an exponential object in \(\mathcal {C}\), which exists since \(\mathcal {C}\) is a closed cartesian category.

Consider a family of wedges. Each wedge consists of \(\gamma \in \text{ Ob }(\mathcal {C}^{\textbf{2}})\) and morphisms

$$w_{n'}: \gamma \rightarrow \bar{\alpha }^{\bar{\beta }}(n,n')$$

for every object \(n \in \text{ Ob }(\textbf{2})\), such that the following diagram commutes:

where \(f: n \rightarrow n'\), \(id_n: n \rightarrow n\) and \(id_{n'}: n' \rightarrow n'\) are morphisms in \(\textbf{2}\).

An end for \(\bar{\alpha }^{\bar{\beta }}\) is an universal wedge such that, for each \(n \in \text{ Ob }(\textbf{2})\) it is denoted

$$\alpha ^{\beta }(n) = \int _{n' \in \textbf{2}} \Pi _{n \rightarrow n'} \alpha (n')^{\beta (n')}$$

with projections \(p_{n'}\) for \(n'=0,1\) such that for any other object \(\gamma \in \text{ Ob }(\mathcal {C}^{\textbf{2}})\) and morphisms \(\{w_{n'}\}_{n' \in \text{ Ob }(\textbf{2})}\), there exists a unique morphism that makes the following diagram commutative:

To see that \(\alpha ^{\beta }\) is an exponential object consider the class of morphisms \(\text{ Hom}_{\mathcal {C}^{\textbf{2}}}(\gamma , \alpha ^{\beta })\) defined as

$$\text{ Hom}_{\mathcal {C}^{\textbf{2}}}(\gamma , \alpha ^{\beta }) = \int _{n \in \textbf{2}} \text{ Hom}_{\mathcal {C}}(\gamma (n),\alpha ^{\beta }(n))$$

Since \(\mathcal {C}\) is cartesian closed, we have that

$$ \int _{n \in \textbf{2}} \text{ Hom}_{\mathcal {C}}(\gamma (n),\alpha ^{\beta }(n)) = \int _{n \in \textbf{2}} \text{ Hom}_{\mathcal {C}}(\gamma (n) \times \beta (n), \alpha (n)) \ = \text{ Hom}_{\mathcal {C}^{\textbf{2}}}(\gamma \times \beta , \alpha )$$

.

Thus, \(\alpha ^{\beta }\) is an exponential object in \(\mathcal {C}^{\textbf{2}}\). \(\square \)

Proof of Proposition 3

Consider an object \(\bar{\Omega }\) in \(\mathcal {C}\) such that:

$$ \bar{\Omega } \hookrightarrow \Omega \times \Omega $$

obtained as the pullback of \(h: \Omega \times \Omega \rightarrow \Omega \) and \(T: 1 \rightarrow \Omega \) in \(\mathcal {C}\):

Given \(\varvec{\omega }: \bar{\Omega } \rightarrow \Omega \in \mathcal {C}^{\textbf{2}}\), we claim that there exists a monomorphism \(\top _{\mathcal {C}^{\textbf{2}}}: \text{ id}_{1} \hookrightarrow \varvec{\omega }\) (where \(\text{ id}_1\) is the identity arrow of the terminal object 1 in \(\mathcal {C}\)) such that for every monomorphism \(f: \mu \hookrightarrow \alpha \), there exists a unique morphism \(\chi _f: \alpha \rightarrow \varvec{\omega }\) that makes the following diagram commutative:

This is ensured if, on one hand, the following diagrams commute in the topos \(\mathcal {C}\):

for \(n = 0, 1 \in \text{ Ob }(\textbf{2})\). This follows trivially from the definition of monomorphisms in \(\mathcal {C}^{\textbf{2}}\). On the other hand, an additional requirement is that \(\mu (0) \hookrightarrow \alpha (0)\) factors through the pullback \(f(0,1): \mu (1) \times _{\alpha (1)} \alpha (0) \hookrightarrow \alpha (0)\). Informally, this means that \(\bar{\Omega }\) makes the following diagram commutative:

Thus, the subobject classifier in \(\mathcal {C}^{\textbf{2}}\) is \(\omega = \hat{\Omega }: \bar{\Omega } \rightarrow \Omega \), which lifts the subobject classifier \(\Omega \) in \(\mathcal {C}\). \(\square \)

Proof of Theorem 1

Trivial. If \(\mathcal {C}\) is a topos it is both complete and cocomplete, it has exponentials and a subject classifier. Thus, according to Propositions 1, 2 and 3, these properties are shared by \(\mathcal {C}^{\textbf{2}}\). \(\square \)

Proof of Proposition 4

Since \(\alpha : \textbf{2} \rightarrow \mathcal {C}\), according to the Yoneda lemma for \(\mathcal {C}\) we have that

$$\text{ Nat }(\text{ Hom}_{\mathcal {C}}(\cdot , \alpha (n)), F_n) \cong F(\alpha (n))$$

for \(n= 0,1 \in \textbf{2}\), where \(F_n\) is the projection of F on the \(\mathcal {C}\) corresponding to the images of functors on n. Then we can define \(\text{ Hom}_{\mathcal {C}^{\textbf{2}}}\) and F satisfying the Yoneda condition for \(\mathcal {C}^{\textbf{2}}\). \(\square \)

Proof of Lemma 1

any category \(\mathcal {F}-\textbf{Coalg}\) is such that the objects are functions

$$h: S \rightarrow \mathcal {F}(S)$$

and the morphisms are the obvious maps between functions \(h: S \rightarrow \mathcal {F}(S)\) and \(h^{'}: S^{'} \rightarrow \mathcal {F}(S^{'})\) that yield a sign in this context, that is, a commutative diagram

Then, given that,

$$[Ay^A, By^B]\ \cong \ B^A A^{A \times B} y^{A \times B}$$

we have that our \(\mathcal {F}\) is an endofunctor in \(\textbf{Set}\), defined for any set S as

$$ \mathcal {F}(S) = B^A A^{A \times B} S^{A \times B}$$

According to Theorem 3.3 in [32] \(B^A A^{A \times B} S^{A \times B} - \textbf{Coalg}\) is a category of (co)presheaves on \(\textbf{Set}\). This indicates that \(\textbf{Learn}(A,B)\) is a topos. \(\square \)

Proof of Theorem 2

\(\textbf{Learn}\) is a category since it satisfies the following properties:

  • Composition of morphisms: given two learners, \(A \rightarrow B\) and \(B \rightarrow C\), defined as two equivalence classes \(\langle \bar{P}, \bar{I}, \bar{U}, \bar{r}\rangle \) and \(\langle \bar{P}^{'}, \bar{I}^{'}, \bar{U}^{'}, \bar{r}^{'} \rangle \), their composition is a learner \(A \rightarrow C\) consisting of the equivalence class with representative \(\langle P \times P^{'}, I *I^{'}, U *U^{'}, r *r^{'}\rangle \), where:

    • \(I *I^{'}((p, p^{'}),a) = I^{'}(p^{'}, I(p,a)) \in C\), where \(p \in P\), \(p^{'} \in P^{'}\), \(a \in A\) for \(P \in \bar{P}\) and \(P^{'} \in \bar{P}^{'}\).

    • \(U *U^{'}((p, p^{'}), a, c) = (U(p, a, r^{'}(p^{'}, I(p,a), c)), U^{'}(p^{'}, I(p,a), c)) \in P \times P^{\prime }\).

    • \(r *r^{'}((p, p^{'}), a, c) = r(p, a, r^{'}(p^{'}, I(p,a), c)) \in A\).

    With this specification, the composition of learners is associative. That is, given learners \(\langle \bar{P}, \bar{I}, \bar{U}, \bar{r}^{'} \rangle : A \rightarrow B\), \(\langle \bar{P}^{'}, \bar{I}^{'}, \bar{U}^{'}, \bar{r}^{'} \rangle : B \rightarrow C\) and \(\langle \bar{P}^{''}, \bar{I}^{''}, \bar{U}^{''}, \bar{r}^{''} \rangle : C \rightarrow D\) we have:

    • \(I *[I^{'} *I^{''}]((p, (p^{'}, p^{''})) a)\) \(=\) \([I *I^{'}] *I^{''}(((p, p^{'}), p^{''}), b)\) \(=\) \(I^{''}(p^{''}, I^{'}(p^{'}, I(p,a)))\).

    • \(U *[U^{'} *U^{''}]((p, (p^{'}, p^{''})), a, d)\) \(=\) \([U *U^{'}] *U^{''}((p, p^{'}), p^{''}), b, c)\) \(=\) \((U(p,a, r^{'}(p^{'}, I(p,a),c)),\) \( U^{'}(p^{'}, I(p,a),\) \( r^{''}(p^{''}, I^{'}(p^{'},b), d)),\) \( U^{''}(p^{''}, I^{'}(p^{'},b), d))\).

    • \(r *[r^{'} *r^{''}]((p, (p^{'}, p^{''})), a, d)\) \(=\) \([r *r^{'}] *r^{''}(((p, p^{'}), p^{''}), b, d)\) \(=\) \(r(p, a, r^{'}(p^{'}, I(p,a), r^{''}(p^{''}, I^{'}(p^{'}, I(p,a)))))\).

  • Identity: given any \(A \in Ob(\textbf{Learn})\), the identity morphism is the equivalence class of learners with representative \(\langle \{*\}, I, U, r \rangle \), where \(I(*, a) = a\), \(U(*, a, a) = *\) and \(r(*, a, a) = a\) for each \(a \in A\).

\(\square \)

Proof of Corollary 1

Immediate by induction. \(Ar(\textbf{Learn}(A,B))\) is a topos, according to Proposition 5. Then, if \(Ar^{k}(\textbf{Learn}(A,B))\) is a topos, \(Ar^{k+1}(\textbf{Learn}(A,B))\) is, by Theorem 1, also a topos. \(\square \)

Proof of Proposition 6

\(\textbf{F}^{\leftarrow }(\bar{\alpha })\), by definition, returns a morphism between the domains of morphisms \(\gamma \) and \(\rho \). Since \(\gamma \) and \(\rho \) are \(B^A A^{A \times B} S^{A \times B}\)-coalgebras they are given by \(P_{\gamma } \rightarrow B^A A^{A \times B} P_{\gamma }^{A \times B}\) and \(P_{\rho } \rightarrow B^A A^{A \times B} P_{\rho }^{A \times B}\), respectively. Then, from \(\textbf{F}^{\leftarrow }(\bar{\alpha })\) we obtain \(\bar{f}: P_{\gamma } \rightarrow P_{\rho }\) that makes the following diagram commutative:

\(\square \)

Proof of Proposition 7

This follows immediately from Proposition 6, since \(\textbf{F}^{\leftarrow }(\gamma _1) = \mathbf {\mathcal {F}}^{\leftarrow \ n}(\gamma _{n})\), where \(\gamma _1: \gamma _0 \rightarrow \gamma ^{\prime }_0\). \(\square \)

Proof of Proposition 8

Immediate from Corollary 4.3 in [2], which indicates that a progressive map \(\mathbf {\mathcal {G}}: \mathcal {L} \rightarrow \mathcal {L}\) on a cocomplete topos has a fixed point.

\(\square \)

Proof of Proposition 9

Given any sequence \(\langle \gamma ^*_0, \gamma ^*_1, \ldots , \gamma ^*_n, \ldots , \gamma ^{*}_{n^*}\rangle \in L^{*}_{(A:B)}[n^*]\), the parameter space \(\hat{P}^{n^*}\) corresponds to the coproduct of the class of domains, codomains and functions between them, \(\{(A, B, f)\}\). Thus, for each sequence in \( L^{*}_{(A:B)}[n]\) for \(n \le n^{*}\) , the resulting \(P^n\) can correspond to coproduct of a subset of \(\{(A, B, f)\}\). \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tohmé, F., Gangle, R. & Caterina, G. A category theory approach to the semiotics of machine learning. Ann Math Artif Intell 92, 733–751 (2024). https://doi.org/10.1007/s10472-024-09932-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10472-024-09932-y

Navigation