How should complexity scale with system size?

Abstract.We study how statistical complexity depends on the system size and how the
complexity of the whole system relates to the complexity of its subsystems. 
We study this size dependence for two well-known complexity measures, the
excess entropy of Grassberger and the neural complexity introduced by Tononi, Sporns and Edelman. We compare these results to properties of complexity measures that one might wish to impose when seeking an axiomatic characterization. It turns out that those two measures do not satisfy all those requirements, but a renormalized version of the TSE-complexity behaves reasonably well. 



Introduction
One would expect that the topic of measuring complexity is a central one in the field of complex systems. But this is not the case. Instead, the notion of complexity is very often used in an informal way with the hope that the readers will understand what the authors mean. Even though this is very often the case we believe that more care should be devoted to the use of the notion of complexity, at least in the field of complex systems itself and that a further development of complexity measures would help this a lot.
The quest for complexity measures has already produced sound results in its long history. On a general level one can distinguish between the complexity of describing and the complexity of interpreting [1]. The first refers to the problem of describing a system while the latter is related to the difficulty of the task to construct the system given the description. One can consider these two aspects as the encoding and the decoding aspect of complexity. In the following we will only consider descriptional complexity. Moreover, we will not consider the complexity of a single object but the statistical complexity of an ensemble of objects, which allows us to exclude all features from the description, that are randomly distributed in the ensemble. In the case of a single object, the algorithmic complexity is the appropriate setting and the entropy itself is the appropriate statistical measure [2]. In case of the ensemble, we are not interested in every detail of the objects in our ensemble, but only in its structures, i.e. its non-random part. The corresponding measures are often called statistical or structural complexity [3,4].
Following the line of research started in [5], in this paper we study two measures of statistical complexity, a e-mail: olbrich@mis.mpg.de the excess entropy [6], originally introduced as effective measure complexity in a time series context [4], and the TSE-complexity [7]. We will show that both measures are closely related.
We study how both measures behave if the system size is increased and will discuss three special cases: (i) adding one independent element (ii) adding an independent system and (iii) adding an exact copy of the first system.
We will discuss how these results correspond to what we expect from a complexity measure and we propose a modification of the TSE-complexity based on these results. Furthermore, we hope that the present results will contribute to a better understanding and a specification of our information-geometric class of complexity measures that we introduced in [5]. This quantity is a natural measure of the uncertainty that one has about the outcome of measuring X C , that is, the information one expects to gain by observing that outcome. Another interpretation of H(X C ) is to consider it as the variety of X C .
Knowing the state of B reduces the uncertainty that one has about the state of C. The remaining uncertainty is then quantified by the conditional entropy of X C given X B : In terms of these entropy measures, the mutual information of X C and X B is given by which measures the reduction of the uncertainty of the outcome of X C given the outcome of X B and vice versa. The conditional mutual information, which is defined as quantifies the average reduction of the uncertainty of the outcome of measuring X C knowing the state of B, if the state from A was already known.
The multi-information is a generalization of the mutual information for more than two random variables. The multi-information of the system X V with respect to its nodes is defined as It is the difference between the sum of the variety of the elements and the variety of the system as a whole. It is often considered as a measure for the total statistical interdependence of the nodes with respect to the joint distribution p, but it remains unclear at this point what total statistical interdependence exactly means as long as one does not consider (4) itself as its definition.
In the literature it is also referred to as integration (see e.g. [7]). It becomes zero if and only if the probability distribution p has the product structure where each p v denotes the marginal distribution of the projection X {v} . In particular, the multi-information vanishes in the case of complete randomness, given by the uniform distribution, and in the case of complete determinism, given by a distribution that is concentrated in one configuration.

Information-theoretic complexity measures
The information-theoretic quantities of Section 2 can be used to define complexity measures for random fields: we shall consider here two measures of structural complexity -the excess-entropy and the TSE complexity.

Excess entropy
One possibility to introduce a measure of statistical complexity is to ask to which extent the state of a subsystem remains uncertain if the state of the rest of the system is known. This uncertainty is given by the conditional entropy H(X A |X V \A ). It quantifies the amount to which the state cannot be explained by dependencies in the system and is therefore considered as random.
In particular, one can ask this question for any single element of our system. The excess entropy is then the difference between the uncertainty of the state of the whole system and the sum of the unreducible uncertainties of the state of the elements using all information available in the system It quantifies the "explainable" part of the variety of the system. The excess entropy as the non-extensive part of the entropy was originally introduced as a complexity measure for time series [3,4,9] and provides in this context a lower bound for the amount of memory one needs for an optimal prediction. The excess entropy (6) is, however, not a direct generalization of the measures for time series, because in the latter case the conditioning in the second term is restricted to observables from the past only. Formula (6) can be rewritten using the average entropies of subsets of size k denoted by H(k, N ) We get the following expression which is illustrated in Figure 1. As a complexity measure for finite systems it was mentioned in passing in [10], see also [5] for a more comprehensive discussion of the relation between the time series case and the case of finite systems. The excess entropy (6) has the following properties (for all proofs see Appendix A): 1. The excess entropy is monotonically increasing with the system size because with v denoting the additional element. 2. The excess entropy of a system consisting of two subsystems A and B is always larger than the mutual information between these two subsystems: 3. The excess entropy of the union of two subsystems is always larger than the excess entropy of one of the subsystems.
4. In general the sum of the excess entropies of the subsystems is neither a lower nor an upper bound for the excess entropy of the whole system.

TSE complexity
Tononi et al. [7] introduced a complexity measure called "neural complexity". It was motivated by the attempt to measure the potential ability of a neural system to produce consciousness in the framework of the information integration theory of consciousness and the dynamical core hypothesis 1 [11]. Starting from the intuition that conscious experience is very rich, they first required that a corresponding neural system should have a large number of 1 In their more recent work (e.g. [12]) Tononi et al. used a different measure which also takes into account causal effects. available states. On the other hand, consciousness is experienced as a unity. Tononi et al. translated this intuition into the requirement that the corresponding systems should have both high entropy and a high integration or multi-information (4) on the system level. In the following we will denote their complexity measure by C T SE , where TSE stands for Tononi-Sporns-Edelman. They defined it as with the abbreviations I(N ) = I(X V ) for the multi-information of the whole system and for the average multi-information of subsystems of size k. This complexity is the higher the larger the increase of the integration with the size of the subsystems deviates from a linear one.
One can express C T SE also using the mean entropies of the subsystems of size k or the mean mutual information for bipartitions into subsystems of size k and N − k. A particular interesting representation shows its relation to the excess entropy. If we denote the mean excess entropy of all subsystems with k elements by E(k, N ) we get for the TSE-complexity For the proof see Appendix B. Using we can write the TSE-complexity also as a weighted sum over the excess entropy of all subsets If we interpret the excess entropy as the complexity of a system with respect to its elements, the TSE-complexity measures the sum over the complexities on all levels with respect to its basic level. Maximizing the TSE complexity should therefore lead to systems which are "complex" on all levels. How does the TSE-complexity grow with system size? To investigate this question it might be useful to start with the most basic representation using the entropies which is illustrated in Figure 1. After adding one element we not only have to replace N by N +1, but we also have to take into account that the averages now include additional subsets. In fact, because N k is the number of subsets with k elements out of N elements, we have N +1 where the second term is the number of new subsystems containing v . Thus with H(k, N ) denoting the mean entropy of subsystems of size k in the system with N elements Inserting this in equation (12) and rearranging the summation we get (see Appendix C):

Maximizer for the the integration and the excess entropy
A representation similar to Figure 1 can also be used to understand the properties of the distributions that maximize the integration (4) and the excess entropy (6) for a system with a fixed number of binary elements (x v ∈ {0, 1}). For a more detailed discussion of this topic see [5]. Figure 2 shows the behaviour of H(k, N ) as a function of k. If the integration is maximized, the entropy of the whole system is equal to the marginal entropy of a single element, which has to be maximal, i.e. 1 bit. The fact that the entropy does not increase further with increasing k implies that the state of the other elements is a function of the state of a single element. One can say that the system is synchronized.
On the other hand, the excess entropy is maximized by a distribution which looks like independent random variables up to subsets of size N − 1, but the Nth element is a function of the other N − 1 elements. These conditions are fulfilled by the parity function A distribution that maximizes the TSE-complexity has to be somewhere between the two cases. One might expect that for such maximizers H(k, N ) increases with the maximal possible slope of 1 up to a subsystem size of N/2 and then remains constant. Such a behaviour of H(k, N ) is, however, in general not achievable due to combinatorical constraints.

Special cases
Let us now consider three special cases of (i) adding an independent element, (ii) two independent subsystems, and (iii) two identical copies. What would we require for a reasonable complexity measure?
(i) additional independent element: The element has no structure itself, so it has no own complexity. Because it is independent on the rest of the system the complexity should not change; (ii) union of two independent systems: Because there are no dependencies between the two systems the complexity of the union should be simply the sum of the complexities of the subsystems; (iii) union of two identical copies: Because there is no need for additional information to describe the second system one could argue that the complexity should be equal to the complexity of one system. One has, however, to include the fact in the description that the second system is a copy of the first one. At least this part should not be extensive with respect to the system size.

Adding an independent node
From (7) we get i.e. the excess entropy remains constant. Because the excess entropy of a single node is zero, adding an independent node leaves the excess entropy unchanged. For the TSE-complexity we can use (Eq. (14)). The second term of the sum vanishes because of the independence.
This shows that the TSE-complexity is increased by adding an independent element.

Two independent subsystems
For the excess entropy we have from (8) the general result that the excess entropy of a system composed of two independent subsystems is the sum of the excess entropies of the subsystems For the TSE-complexity this case is already a nontrivial one because the subsystems of the whole system can contain elements from both subsystems. But using the result (17), we have for an arbitrary subset By using this property, we can show (see Appendix D) that with N A denoting the number of elements in A and N B the number of elements in B. Obviously, in contrast to the excess entropy the TSE-complexity of the unification of two independent compounds is not the sum of the TSEcomplexities of the parts. But, we can re-establish this result for a renormalized version of the TSE-complexitỹ Note that this renormalization also restores the intuitive property that the complexity remains unchanged if a single independent node is added.

Two copies of the same system
Now we consider again two subsystems X A and X B but now X B is an exact copy of X A . Thus

It follows that H(X A , X B ) = H(X A ) and for any v
Therefore we get for the excess entropy The main problem with this result is that the "complexity" of two identical copies measured by the excess entropy is independent of the complexity of the single system which is clearly counterintuitive and shows a severe limitation of the excess entropy as a complexity measure for finite systems.
The situation for the TSE-complexity is more complicated. Using a reasoning similar to the case of the two independent subsystems (see Appendix E) one finally arrives at which leads to a very reasonable lower bound for the renormalized TSE-complexitỹ The single summands in the second term of (21) cannot be simplified further, one can, however, try to derive an upper bound by using According to our requirements for a complexity measure the second term in (21) should grow at least more slowly than the system size. At the moment, however, it is an open problem under which conditions this applies.

Discussion
We studied how two established measures of statistical complexity behave if the system size is increased and compared the results with intuitive requirements for complexity measures in three special cases: (i) adding one element, (ii) adding an independent system and (iii) adding a perfect copy of the original system. While the excess entropy behaves according to what one should expect in the cases (i) and (ii), the result in case (iii) did not meet our intuitions: the excess entropy of the two copies does not depend on the complexity of the original system at all. This indicates a possible limitation of the excess entropy as a general complexity measure, because it only relates two levels to each other: the level of the elements and the system level, but ignores the intermediate levels.
This disadvantage might be avoided in the TSEcomplexity because it considers all possible subsets of a given system. We could show this by writing the TSEcomplexity as a sum over all subsystem sizes of the mean excess entropies of subsystems of this size. The further analysis of the behavior of the TSE-complexity revealed however, that it grows even if one adds independent elements which again is a counterintuitive and therefore unwanted property. The detailed analysis of the case of two independent systems showed that this unwanted property can be avoided by dividing the TSE-complexity by N + 1 if the system has N elements. Therefore we argue that the normalized version (19) provides a better measure of statistical complexity than the original TSE-complexity. It is however, not clear, to which extent the normalized TSE-complexity avoids the problem of the excess entropy in the case of two identical copies, because we found no simple interpretation of the second term in (21).
We hope that the insights obtained in this paper will also help to advance our geometric approach to complexity presented in [5]. This, however, requires more sophisticated mathematical considerations and will therefore be presented elsewhere.
Funded by Volkswagen Foundation.

412
The European Physical Journal B

Appendix A: Excess entropy of two subsystems
The excess entropy of a random variable X V = X A∪B is defined as (23) Now let us label the elements of A in an arbitrary order as v A j , j = 1, . . . , N A and the elements in B as v B k , k = 1, . . . , N B . Moreover, be A k the set containing v A j with 1 ≤ j ≤ k and B k the set containing v B j with 1 ≤ j ≤ k. Using and we can rewrite the excess entropy as The conditional mutual information is always larger than zero and therefore To compare the excess entropy of the whole system with the excess entropy of one of its parts we have using the same notation By writing H(X B |X A ) as a sum according to (26) we can express the right hand side as a sum of E(X A ) and a sum of conditional mutual informations: Thus E(X A∪B ) ≥ E(X A ) and similarly E(X A∪B ) ≥ E(X B ). By combining equations (27) and (29) we can also relate the excess entropy of the system to the sum of the excess entropy of its subsystems

Appendix B: TSE-complexity and excess entropy
It was already noted in [10] that the last summand in (12) is equal to the excess entropy. But it is also possible to express the whole sum as a sum over excess entropies: which leads to Now we can apply this formula again for all subsystems of size N − 1 Here we used that for k ≤ N − k meaning that averaging the entropy of subsystems of size k over the whole system gives the same result than averaging first for a subsystem of size N − k ≥ k and then averaging over all these subsystems. Repeating the recursion we get and finally Rearranging the sums and summing up leads to equation (10)

Appendix C: TSE-complexity with an additional element
Starting point is again the TSE-complexity expressed by the average subset entropies (12) After increasing the system by one additional element we have

Now we use that
and therefore Now we can introduce for any subset Y of size k − 1 the conditional mutual information and get the result (14) for the behavior of TSE-complexity after adding one element. Here we used that there are N k−1 subsets from the original system containing k − 1 elements and that

Appendix D: TSE-complexity of the sum of two independent systems
How does the complexity measure of Tononi, Sporns and Edelman decompose if a system V is composed of two independent subsystems A and B, i.e. V = A ∪ B and I(X A : X B ) = 0? In the following the number of elements in V will be denoted by N . For the subsystems N A and N B will be used respectively.

414
The European Physical Journal B Starting from is the average excess entropy over all subsets of cardinality k, one obtains . Using the fact that the excess entropy is additive for independent subsystems, the TSE-complexity can be decomposed as follows: shows that the two expressions differ in their weights in the sum by a factor 2 of If this factor is independent of k the TSE-complexity of X can be written in terms of the TSE-complexities of A and B. We can show (see Appendix F) that the sum is equal to NA+NB+1 NA+1 . Then the TSE-complexity can be decomposed as follows: 2 Since this factor just depends on the cardinality of the specific subset, k will be used in the following to denote the cardinality of YA.

Appendix E: TSE-complexity of two identical copies
Now consider the situation that the system consists of two identical copies of one smaller system, i.e. V = A ∪ B with p(x A , x B ) = p(x A )δ xA (x B ). In this case each subset Y of V can be considered to be composed of two subsets from Y A ⊆ A and Y B ⊆ B respectively. Furthermore each subset Y B of B corresponds to a subset Y A B of A. Since V is a disjoint union of A and B, one has |Y | = |Y A | + |Y B |. Due to B being a copy of A the entropy of Y is given by The TSE-complexity of X can therefore be decomposed as follows 3 :   NA + (N + 1) + 1 NA + 1 .

Appendix F: Evaluation of the sum (31)
In the following we want to show by induction that