# Topology driven modeling: the IS metaphor

- First Online:

DOI: 10.1007/s11047-014-9436-7

- Cite this article as:
- Merelli, E., Pettini, M. & Rasetti, M. Nat Comput (2015) 14: 421. doi:10.1007/s11047-014-9436-7

- 7 Citations
- 1.1k Downloads

## Abstract

In order to define a new method for analyzing the immune system within the realm of Big Data, we bear on the metaphor provided by an extension of Parisi’s model, based on a mean field approach. The novelty is the multilinearity of the couplings in the configurational variables. This peculiarity allows us to compare the partition function \(Z\) with a particular functor of topological field theory—the generating function of the Betti numbers of the state manifold of the system—which contains the same global information of the system configurations and of the data set representing them. The comparison between the Betti numbers of the model and the real Betti numbers obtained from the topological analysis of phenomenological data, is expected to discover hidden *n-ary relations* among idiotypes and anti-idiotypes. The data topological analysis will select global features, reducible neither to a mere subgraph nor to a metric or vector space. How the immune system reacts, how it evolves, how it responds to stimuli is the result of an interaction that took place among many entities constrained in specific configurations which are relational. Within this metaphor, the proposed method turns out to be a global topological application of the S[B] paradigm for modeling complex systems.

### Keywords

Immune system Multilinear mean field Pattern discovery Complex systems Adaptive models S[B] paradigm Topology of data Betti numbers Big Data## 1 Introduction

The objective pursued in this note is to frame the research on the immune system as part of *data science*. Such research is naturally complex and articulated and our contribution intends to be here along the lines of seeing it as a viable candidate for topological data analytics and an example of the S[B] paradigm for modeling complex systems. We recall that data science is the practice to deriving valuable insights from data by challenging all the issues related to the processing of very large data sets, while Big Data is jargon to indicate such a large collection of data (for example, exabytes) characterized by high-dimensionality, redundancy, and noise. The analysis of Big Data requires handling high-dimensional vectors capable of weaning out the unimportant, redundant coordinates. The notion of data space, its geometry and topology are the most natural tools to handle the unprecedentedly large, high-dimensional, complex sets of data (Carlsson 2009; Edelsbrunner and Harer 2010); basic ingredient of the new data-driven complexity science (TOPDRIM 2012; Merelli and Rasetti 2013).

Topology, the branch of mathematics dealing with qualitative geometric information such as connectivity, classification of loops and higher dimensional manifolds, studies properties of geometric objects (shapes) in a way which is less sensitive to metrics than geometric methods: it ignores the value of distance function and replaces it with the notion of connective nearness: *proximity*. All these features make topology ideal for analysing the space of data.

Starting from the notion of a mean field proposed by Parisi in his simple model for idiotypic network (Parisi 1990), we propose a more sophisticated version that is multilinear in the configurational variables (the antibody concentrations) instead of being constant or at most linear. Multi-linearity allows us to recognize in the partition function \(Z\) of the model, that embodies all the statistical properties of the system at equilibrium, features similar to those of a particular *functor* of a topological field theory. The latter contains indeed the same global information about the topological properties (specifically its global invariants) of the system configuration space and can be identified with the generating function of Betti numbers, namely the Poincaré polynomial of data space (Atiyah and Bott 1983). Once the homology of the space of data has been constructed, and its generating cycles have been defined, the related two sets of Betti numbers can be compared. In this way, self-consistent information is obtained, regarding \(2{\hbox {-}}ary\), \(3{\hbox {-}}ary,\, \dots n{\hbox {-}}ary\) relations among antibodies. Comparison between the Betti numbers of the model and the real Betti numbers, obtained by constructing the topology of phenomenological immune system space of data, will unveil the hidden relations between idiotypes and anti-idiotypes; in particular, those relations where components interact indistinctly and therefore can not be reduced to a mere subgraph, but rather they bear on a new concept of interaction, scale-free and metric-free. The analysis of Betti numbers on phenomenological data can be dealt with techniques based on persistent homology (Carlsson 2009; Petri et al. 2013).

The challenge we are facing is to unveil whether in natural, multi-level complex systems, \(n\)-body interactions can drive the emergence of novel *qualia* in these systems. In physics, the interactions between material objects in real space are binary. This means that mutual forces and motions are produced by two-body interactions, the building blocks of any many-particle system. Thus at the atomic or molecular level description of matter (living or not) the total force acting on any given particle is the result of the composition of binary interactions. However, how can we discover if \(n\)-body interactions do exist? What we are proposing here is to use the IS metaphor, i.e. a complex system whose adaptivity is driven by data, as a global topological application of the S[B] paradigm. S[B] allows us to entangle in a unique model the computational component with the coordination. In particular, B accounts for the computation while S describes the global computation context (Merelli et al. 2013). The adaptation phase occurs when a machine can no longer compute in a given state of the system, thus the system changes state, i.e. the global context of computation. In the IS metaphor the computation context can be identified by the global invariants while the computation with the model of interactions, a sort of interactive machine. Each time we discover new global invariants, a new context of computation arises and with it a new IS model must be generated; we call this step the adaptation phase.

In the following, after giving a brief description of the antigen-free immune system and recalling Parisi’s mean field model, we formally define the new topological field model, and, finally, discuss the S[B] paradigm. An appendix is provided with a general introduction to the fundamental tool of *persistent homology and Betti numbers*.

## 2 The antigen-free immune system

*antigens*), which are antibody generators. In the scheme proposed by Jerne, it is the antigen that provokes an immune response and each antibody is represented as a large Y-shaped protein. The immune system uses this protein to identify and neutralize foreign objects. The antibody can recognize and bind a specific part of the antigen; resorting to this binding mechanism it can block the attack. Moreover, in Jerne’s network theory, antibodies are capable of being recognized by other antibodies; whenever this happens the former is suppressed and its concentration is reduced while the latter is stimulated and its concentration increases (see Fig. 1).

The mechanism whereby the production of a given antibody elicits or suppresses the production of other antibodies that, in turn, elicit or suppress the production of other antibodies like a concatenation of events, hints to a strict analogy of the immune system function with memory in the brain. It recalls the way in which a firing neuron may induce or inhibit the firing of other neurons, and so forth. On the assumption that a functional network of antibodies is possible, several models have been constructed, among which Parisi’s model. The latter studies the persistence of immune memory in the absence of any driving effect of external antigens and it offers a robust, though simple, theoretical framework without providing detailed description of the system (Parisi 1990).

The model we propose is a preliminary test of data field theory; it aims at a deeper understanding of the functional properties that the global and persistent topological properties of an antibodies data space can imply. In particular, it targets at discovering the existence of \(n\)-ary relations among antibodies and determining how the ensuing configurations influence the immune system reaction to the presence of antigens. The extraction of global qualitative information from an antibodies data space (e.g. concentrations), should lead to the discovery of those characteristics that are shared in a group of immunoglobulin receptor molecules. This means not only discovering a single idiotype, but the capacity of being active in the presence of \(n\) others. We want to prove that topological data analysis, through persistent homology and its Betti numbers, allows us to determine the effective \(n\)-antibody configurations. Note that the models proposed in literature to describe the relationship between structure and function in biological networks are all based on the concept that any relation can be reduced to a set of binary relation (Hart et al. 2009): we argue that this is not necessarily the case. We start thinking of models as relationships, i.e. facts in logical space of forms. Forms that can be directly classified by Betti numbers, extracted by calculating the Betti numbers through the persistent homology of the space of data and used in the frame of a conceptual model able to bear on those topological features.

### 2.1 Parisi mean field model for IS

The simplest and most efficient network of the immune system is represented by a model that can be easily formulated in the absence of antigens. Although it is well known that the number of specific lymphocytes plays a crucial role, the variables of the network model are limited to the antibody concentrations.

The mean-field idiotypic network model of antigen-free Immune System, proposed by G. Parisi and inspired by an earlier Hopfield’s model conceived to represent the brain and many other similar models (Hopfield 1982; Hoffmann 1975, 2010; Farmer et al. 1986; Varela et al. 1988), describes essentially an iterated cascade of events, in which the production of a given antibody provokes, or possibly inhibits, the production of other antibodies, which in turn induce, or possibly impede, the production of other antibodies, which in turn give rise to or prevent the production of other antibodies, etc..

*mean field*) represents the total stimulatory/inhibitory (depending on its sign) effect of the whole network on the \(i\)-th antibody. \(h_i\) is positive when the excitatory effect of the other antibodies is greater than the suppressive effect and then \(c_i\) is one. Otherwise \(h_i\) is negative and \(c_i\) is zero. The mean-field is expressed typically as

However, this model is elementary in view of testing the perspectives of a data field theory. We need to increase its complexity in order to reach a description of the system sufficiently detailed to catch the global features of its data space. We generalize the mean field in such a way that it crucially depends on those topological features of the space of antibody concentrations that will be reflected in the topological properties of the system space of data. We construct a model sensitive to global features, designed to benefit of the advantage of lending itself to a kind of *reverse engineering* of the process of field construction. In the model, the antibodies with positive \(c_i \, (=1)\) are actually produced by the system while those absent \((c_i=0)\) are suppressed. Suppression due to clonal abortion is neglected.

## 3 The topological field model for antigen-free immune system

In this section, we generalize the way how Parisi’s linear model represents immunological memory by a linear mean field. The antibodies of the idiotypic cascade are denoted by \(Ab_i\); during the production of \(Ab_1\), ignited directly by the antigen, the environment of lymphocytes is modified by \(Ab_2\): the life-span of the \(Ab_1\)-producing cells and the population of helper cells specific for \(Ab_1\) increase. The symmetry of the couplings \((J_{ik}=J_{ki})\) implies that \(Ab_3\) should be rather similar to \(Ab_1\), the internal image of \(Ab_2\) should persist after it disappeared, its presence induces the survival of memory cells directed against the antigen. The process continues by iteration. In the extended model, we assume the production of \(Ab_i\) is conditioned to different extents and also by the simultaneous presence of a subset of \(2, 3, ... , N\), antibodies.

A crucial assumption we add to the model is that the coupling constants \(J_{\ell _1 \dots \ell _k \dots \ell _{n+1}}\) are taken to be proportional to a linear combination (with negative coefficients) of the simplex \(n\)-volume \(V^{(n)}\), the simplex corresponding to the cell defined by the set \(\bigl \{ \ell _1 , \dots , \ell _n \bigr \}\) in the cells of cycle \(C^{(n)} ([\ell _1, \dots , \ell _{(n+1)} ])\), with the volume of the cell boundary of dimension \(n-2\), weighted by the curvature at that boundary. The latter measures the ease with which the \(n\)-body interaction is favored by the manifold bending. The ensuing action is expected to measure reasonably well the probability that the \(n\)-body process described by that coupling takes place.

When the model with such interaction form is dealt with as a statistical field theory it turns out to be fully isomorphic with a Euclidean topological field theory describing a totally different physical system: gravity coupled with matter in a simplicial complex setting, consistent with general relativity. We think back to the standard example of the Ising model, which also has variables in \({\mathbb {Z}}_2\) (Parisi 1998) and recall that a statistical field theory is any model in statistical mechanics where the degrees of freedom comprise a field; i.e. the microstates of the system are expressed through field configurations. The features of the ensuing theory are quite general and far reaching. The topology of the associated moduli space depends only on the manifold genus \(g\), on the dimension \(n\) of the (vector) bundle over it used to define the field, and on the dimension \(\delta ({\mathrm{mod}} \, n)\) of the associated determinant bundle. Such space is a projective variety, smooth only if \((\delta ,n) = 1\). The recursive determination of the Betti numbers in this case is given by the Harder and Narasimhan and Atiyah and Bott recursions (Harder and Narasimhan 1975; Atiyah and Bott 1983). The former explicitly counts points of the moduli space, the latter resorts to an infinite-dimensional Morse theory with the field action functional as Morse function. These recursions lead to a closed formula for the Poincaré polynomial, i.e. for the Betti numbers of the moduli space. These implicit methods were successively made explicit (Desale and Ramanan 1975).

What is intriguing is that our field theory turns out to be isomorphic to \({\mathbb {Z}}_2\) (quantum) gravity, dealt with in nonperturbative fashion by standard Regge calculus (Regge 1961).

Let us recall here that the construction of a consistent theory of quantum gravity in the continuum is a problem in theoretical physics that has so far defied all attempts of a rigorous formulation and resolution. The only effective approach to try and obtain a non-trivial quantum theory proceeded via discretization of space-time and of the Einstein action, i.e., by replacing the space-time continuum by a combinatorial simplicial complex and deriving the action from simple physical principles.

Quantum Regge calculus, based on the well-explored classical discretization of the Einstein action due to Regge, and the essentially equivalent method of dynamical triangulations are the tools that proved most successful. Regge’s method consists in approximating Einstein’s continuum theory by a simplicial discretization of the space-time (in gravity a four-dimensional Lorentz manifold) resorting to local building blocks (simplices) and then constructing the gravitational action as the sum of a term depending on the (hyper)volumes of the different simplicial complexes and another reflecting the space-time curvature. The metric tensor associated with each simplex is expressed as a function of the squared edge lengths, which are the dynamical variables of this model. Summing over all interpolating geometries (state sum) generated by the simplicial complex construction in the embedding higher-dimensional ones (filtration), allows us to derive both the Einstein action and the equilibrium configurations simply by means of counting procedure (entropy estimate).

The \({\mathbb {Z}}_2\) version of the model is one in which representations of \(SU(2)\) labeling the edges in quantum Regge calculus are reduced to \({\mathbb {Z}}_2\). The power of the method resides in the property that the infinite degrees of freedom of Riemannian manifolds are reduced by discretization; and the theory can deal with PL spaces, described by a finite number of parameters. Moreover, for the manifolds approximated by a simplicial complex (or by dynamically triangulated random surfaces), the local coordination numbers are automatically included among the dynamical variables, leaving the quadratic link lengths \(q_\ell \), globally constrained by triangle inequalities, as true degrees of freedom.

More precisely, the model adopted here for the immune system is isomorphic to the \({\mathbb {Z}}_2\) Regge model, where the quadratic link lengths \(q_\ell \) of the simplicial complexes are restricted to take on only two values: \(q_\ell = 1 + {\mathfrak {l}} \sigma _\ell \), where \(\sigma _\ell = \pm 1 = 2 c_{\ell } - 1\). Such model has been exactly solved (in the case of quantum gravity) via the matrix model approach (Ambjørn et al. 1985) and with the help of conformal field theory (Knizhnik et al. 1988). A crucial ingredient is the choice of functional integration measure, whose behavior, with respect to diffeomorphisms, is fundamental. The very definition of diffeomorphism is a heavy constraint in constructing the PL space exactly invariant under the action of the full diffeomorphism group (Menotti 1998), and only the recent construction of a simplicial version of the mapping class group made it viable (Merelli and Rasetti 2013).

It is worth recalling that in (Desale and Ramanan 1975) arithmetic techniques and the Weil conjecture were used, and a crucial ingredient was the property that the volume of a particular locally symmetric space attached to \(SL_n\) with respect to the canonical measure—an invariant known as the Tamagawa number of \(SL_n\)—equals 1. The simplicial volume is a homotopy invariant of oriented, closed, connected manifolds defined in terms of the singular chain complex with real coefficients. Such invariant measures the efficiency of representing the fundamental space class using singular simplices. Since the fundamental class is nothing but a generalized triangulation of the manifold, the simplicial volume can be interpreted as well both as a measure for the complexity of the manifold and as a homotopy invariant approximation of the Riemannian volume. \(Z(x)\) provides then the generating function (Poincaré polynomial) of the Betti numbers of \(\mathcal {A}\).

The final step is to compare the Betti numbers obtained empirically from the data against such generating function, thus determining [simply through the solution of a system of (non-linear) algebraic equations] the set of non-zero \(J_{\ell _1 \dots \ell _k \dots \ell _{n+1}}\). This fully determines which antibody influences which, including ‘many-body’ influences, i.e. when and if it may happen that a given set of (two or more than two) antibodies play a role only when simultaneously active.

A short discussion of Regge calculus, meant to introduce in simple way, accessible also to readers not familiar with the notion of geometry over discrete spaces (simplicial complexes), and some of the notions actually used in the derivation can be found in Battaglia and Rasetti (2003), where some of the preliminary ideas of the scheme are described, successively developed in extended way for present and other applications. As for the work in \(\mathbb {Z}_2\) quantum gravity which our generalized model of immune system is isomorphic to, a more articulated and complete set of references is available in Giulini (2007) and Bittner et al. (1999).

## 4 A global topological application of the S[B] paradigm

In this section we introduce the \(S[B]\) paradigm for modeling complex adaptive systems and we discuss the IS metaphor as a global topological application of the adaptation phase; the aim is to contribute to understand the adaptability feature that, as addressed in the paper of Stepney et al. (2005), still remains ‘poorly understood’.

In the \(S[B]\) paradigm a complex system consists of two components, the computation level from where its behavior \(B\) emerges, the interactive machine, and the context of the computation, its global structure \(S\). Both levels are distinct but entangled in a unique computational model that evolves by learning and adapting. The computational model associated to the \(S[B]\) plays a crucial role in the characterization of the adaption phase, it can be represented by any mathematical model of computation, provided that it allows to express the dependency between different levels of abstraction.

A full yet concise description of the formal definition of \(S[B]\) on a finite state machine that encapsulates both the computation \((B)\) and its controller \((S)\) follows. In this framework, both \(B\) and \(S\) are classically described as a finite state machine of the form \(B=(Q, q_0, \rightarrow _B)\) (\(Q\) set of \(B\) states, \(q_0\) initial \(B\) state and \(\rightarrow _B\) transition relation) and \(S = (R, r_0, \mathcal {O}, \rightarrow _S, L)\) where \(R\) is a set of \(S\) states, \(r_0\) is the initial \(S\) state, \(\mathcal {O}\) is an observation function of \(B\) states, \(\rightarrow _S\) is a transition relation and \(L\) is a state labeling function. The function \(L\) labels each \(S\) state with a formula representing a set of constraints over an *observation* of the \(B\) states. Therefore, a \(S\) state \(r\) can be directly mapped to the set of \(B\) states satisfying \(L(r)\). Through this hierarchy, \(S\) can be viewed as a *second-order* structure \((R \subseteq 2^Q, r_0,\rightarrow _S \subseteq 2^Q \times 2^Q, L)\) where each \(S\) state \(r\) is identified with its corresponding set of \(B\) states. An \(S[B]\)*system* is the combination of an interactive machine \(B=(Q, q_0, \rightarrow _B)\) and a coordinator \(S=(R, r_0,\mathcal {O}, \rightarrow _S, L)\) such that for all \(q \in Q\), \(\mathcal {O}(q) \ne \perp \). In any \(S[B]\) system the initial \(B\) state must satisfy the constraints of the initial \(S\) state, i.e. \(q_0 \models L(r_0)\).

During adaptation phase the \(B\) machine is no longer regulated by the \(S\) controller, except for a condition, called *transition invariant*, that must be fulfilled by the system undergoing adaptation. The complete and formal definition of the \(S[B]\) based on finite state machine, its semantics and the *adaptability checking* can be found in Merelli et al. (2013).

*recurrent neural network*, a process based on active exploration of an unknown environment and the generation of a finite state automata model of the environment.

Summarizing, inspired by the IS metaphor we present a computational model as an higher order relational model which deals with multilinear n-body interactions, the interactions characteristic of the immune response. In such case, the model adapts when it no longer fits the space of observed data, and the construction of the topological field model allows us to determine the values of the \(J_{{\{\ell \}}}\) matrix, hence, e.g., the classes of antibodies that are in relation in the current immune response. We call this step a recursive construction of a relational model that learns new antibody relations as immune response to the presence of an antigene.

As future work, we aim to apply the proposed approach to real-world IS phenomena treated bot in *silico* and in *vivo* experiments and compare the results with other similar models.

## 5 Concluding remarks

We have defined a new topology-based method suitable to provide a benchmarking application of the S[B] paradigm. The method relies on a multi-linear model of immune system inspired by the topology of space of data. Starting from the notion of an Ising model in a mean field, given by Parisi and others in their seminal work, we proposed a more sophisticated version that is multilinear in the configurational variables (the antibody concentrations) instead of constant or at most linear. This work is not intended to be the study of the dynamics of the immune network in view of establishing the equilibrium among antibodies, but, instead, it has a prospective interest and strategic aim at defining a new approach for the analysis of the immune system as a metaphor of a real-life system represented in terms of Big Data.

## Acknowledgments

We thank the organizers of the ICARIS Workshop for offering the opportunity to present new ideas related to the TOPDRIM project in such a stimulating and pleasant environment. We acknowledge the financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme (FP7) for Research of the European Commission, under the FET-Proactive grant agreement TOPDRIM, number FP7-ICT-318121.

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.