Information geometry of scaling expansions of non-exponentially growing configuration spaces

Many stochastic complex systems are characterized by the fact that their configuration space doesn’t grow exponentially as a function of the degrees of freedom. The use of scaling expansions is a natural way to measure the asymptotic growth of the configuration space volume in terms of the scaling exponents of the system. These scaling exponents can, in turn, be used to define universality classes that uniquely determine the statistics of a system. Every system belongs to one of these classes. Here we derive the information geometry of scaling expansions of sample spaces. In particular, we present the deformed logarithms and the metric in a systematic and coherent way. We observe a phase transition for the curvature. The phase transition can be well measured by the characteristic length r, corresponding to a ball with radius 2r having the same curvature as the statistical manifold. Increasing characteristic length with respect to size of the system is associated with sub-exponential sample space growth is associated to strongly constrained and correlated complex systems. Decreasing of the characteristic lenght corresponds to super-exponential sample space growth that occurs for example in systems that develop structure as they evolve. Constant curvature means exponential sample space growth that is associated with multinomial statistics, and traditional Boltzmann-Gibbs, or Shannon statistics applies. This allows us to characterize transitions between statistical manifolds corresponding to different families of probability distributions.

Abstract. Many stochastic complex systems are characterized by the fact that their configuration space doesn't grow exponentially as a function of the degrees of freedom. The use of scaling expansions is a natural way to measure the asymptotic growth of the configuration space volume in terms of the scaling exponents of the system. These scaling exponents can, in turn, be used to define universality classes that uniquely determine the statistics of a system. Every system belongs to one of these classes. Here we derive the information geometry of scaling expansions of sample spaces. In particular, we present the deformed logarithms and the metric in a systematic and coherent way. We observe a phase transition for the curvature. The phase transition can be well measured by the characteristic length r, corresponding to a ball with radius 2r having the same curvature as the statistical manifold. Increasing characteristic length with respect to size of the system is associated with sub-exponential sample space growth is associated to strongly constrained and correlated complex systems. Decreasing of the characteristic lenght corresponds to super-exponential sample space growth that occurs for example in systems that develop structure as they evolve. Constant curvature means exponential sample space growth that is associated with multinomial statistics, and traditional Boltzmann-Gibbs, or Shannon statistics applies. This allows us to characterize transitions between statistical manifolds corresponding to different families of probability distributions. understand, describe, and predict the statistical properties of a plethora of different complex systems; see e.g., [1] for an overview. While the microscopic nature of complex systems can be very different, their statistical properties often have common features across various systems. Entropy is undoubtedly the key concept in statistical physics that connects the statistical description of microscopic dynamics with the macroscopic thermodynamic properties of a system. The notion of entropy has been adopted also in other contexts, such as information theory or statistical inference, which are concepts quite different from thermodynamics [2]. One elegant and powerful concept arising from the theory of statistical inference is that of information geometry [3,4]. It applies ideas from differential geometry to probability theory and statistics. In this context, the concept of entropy also plays a crucial role, since the metric on the statistical manifold is derived from the corresponding (relative) entropy. This, so-called Fisher-Rao metric, enables us to analyze statistical systems from a different perspective. For example one can study critical transitions by calculating singularities of the metric [5].
In information geometry, most attention has focused on systems that are governed by Shannon entropy [3,4]. However, it is well known that many complex systems, especially strongly correlated or constrained systems, or systems with emergent components, cannot be described within the framework of Shannon entropy [1]. For this reason, a number of generalizations to Shannon entropy have been proposed; in connection with power laws [6,7], special relativity [8], multifractal thermodynamics [9], or black holes [10,11].
To classify entropies for stochastic systems of various kinds, it is natural to start with the information-theoretic foundations of Shannon entropy, i.e. the so-called Shannon-Khinchin (SK) axioms [12,13]. The first three SK axioms are usually formulated as: -(SK1) Entropy is a continuous function of the probabilities p i only 1 .
-(SK2) Entropy is maximal for the uniform distribution, p i = 1/W . -(SK3) Adding a state W + 1 to a system with p W +1 = 0 does not change the entropy of the system.
The fourth axiom is called the composability axiom and determines the entropy functional uniquely: where H(B|A k ) is the entropy of the conditional probability, p B|A k . In this formulation, the unique solution that is compatible with SK1-4 is Shannon entropy H(P ) = − i p i log p i . When the fourth axiom is relaxed, one can obtain wider class of entropic functionals. First generalizations of the fourth axiom were introduced in connection with generalized additivity [14,15] group laws [16] or statistical inference [17]. These approaches are somewhat limited in scope, since they all lead to class of entropies, which can be expressed as a function of Tsallis entropy [6]. The relaxation of SK4 also naturally leads to a classification scheme of complex systems [18,19]. The main idea of this approach is to study the asymptotic scaling exponents of the entropy functional that are associated to a particular system's configuration space. These systems are associated with systems that have a sub-exponentially growing configuration space, when seen as a function of degrees of freedom. This clas-sification scheme is based on mathematical analysis of the asymptotic scaling of the entropic functionals that are governed by the first three SK axioms 2 .
Since the configuration space of most complex systems does not grow exponentially (as for the case of Shannon entropy), but polynomially [7], as a stretched exponential [20], or even super-exponentially [21], the appropriate scaling behavior of the entropic functional is crucial for a proper thermodynamic interpretation. For this end, we use recently developed scaling expansion [22], which is a special case of Poincaré asymptotic series [23], whose coefficients are the scaling exponents of the system. The aim of this paper is to define a generalization of Shannon entropy that matches the appropriate asymptotic scaling of a given system, and use it to derive the associated generalized Fisher-Rao metric of the underlying statistical manifold. For this end, we use the framework of deformed logarithms [35,25]. It has been shown recently [26] that one can naturally obtain two types of information metric within that framework, one, corresponding to the maximum entropy principle with linear constraints, and the other, corresponding to the maximum entropy principle when used with so-called escort constraints, instead of ordinary (linear) constraint terms.
Escort distributions appeared in connection with chaotic systems [27], and were discussed in the context of superstatistics [28,29]. Later it became possible to relate them to linear constraints through a log-duality [30]. Interestingly, escort distributions also appear as a canonical coordinate in information geometry [31,32]. In this paper, we use both linear and escort approaches and compare their corresponding metric tensor and its invariants. We focus particularly on the microcanonical ensemble in the thermodynamic limit, since the metric should correspond to the system's asymptotic properties, given by its characteristic structure. Some partial results for the curvature of escort metric were recently obtained in this direction [34]. However, no systematic and analytically expressible results for metric tensor and its scalar curvature have been obtained so far. We show that the curvature of the statistical manifold naturally distinguishes between three types of systems: systems with sub-exponentially growing configuration or sample space (correlated and constrained systems), exponentially growing sample space (equivalent to ordinary multinomial statistics), and super-exponentially growing sample space (e.g. systems that develop emergent structures as they evolve). The vector of scaling exponents plays the role of a set of order parameters, i.e., the distance from the phase transition between sub-exponential and super-exponential phases.
The paper is organized as follows: Section 2 introduces the scaling expansion and how to calculate corresponding scaling exponents. We discuss several systems with non-trivial scaling exponents. In the last part of the section we establish a representation of universality classes for complex systems, by introducing scaling vectors and their basic operations. In Section 3, we briefly revisit the results of information geometry in the framework of φ-deformed logarithms. We focus on information geometry with both, linear, and escort constraints. The main results of the paper are derived in Section 4, where we define the appropriate generalized logarithm by combining the φ-deformation framework and the requirement of asymptotic scaling. The properties of corresponding entropic functionals are discussed. We exemplify the whole approach by the simple, yet very general, class of entropies with one correction term from the scaling expansion and calculate the asymptotic behavior of scalar curvature of the microcanonical ensemble in the thermodynamic limit. The last section draws conclusions. The paper has several appendices that contain several technical details.

Scaling expansion of the volume of configuration space
The scaling expansion [22] is a method to investigate of the asymptotic scaling behavior of a sample space volume, W (N ). Here W is the number of accessible states in a system, and N indicates size of a system 3 . The scaling expansion is a special case of the Poincaré asymptotic series, where the coefficients correspond to the scaling exponents of the system. We introduce the notation for the iterated use of functions, , to define a set of re-scaling operations, ). This set of re-scaling operations contains the well-known multiplicative re-scaling, x → λx (n = 0), power rescaling x → x λ (n = 1), and the additive rescaling x → x + log λ (n = −1). For each n, r (n) is a representation of the multiplicative group (R + , ×), i.e., r λλ . We now investigate how a function, W (N ), scales with re-scaling of N → r (n) λ (N ). Note that due to a simple theorem (see Appendix A2 in [22]) the function z(λ), defined as z(λ) = lim N →∞ , must have the form z(λ) = λ c for c ∈ R ∪ {±∞} whenever the limit exists. We start with multiplicative scaling (n = 0): The expression W (λN ) W (N ) is, according to the theorem, equal to λ c0 . We assume that W (N ) is a strictly increasing function, then it follows that c 0 ≥ 0 4 . It can happen that c 0 = +∞. In that case, the expression grows faster than any polynomial. This problem can be resolved by using log (l) (W (N )) instead of W (N ), for an appropriate choice of l. The parameter l is chosen such that We call l the order of the process.
We get that W (N ) ∼ exp (l) (N c (l) 0 ), for N 1. To get the corrections to the leading order, we use the fact that log (l) W (λN ) When we use the re-scaling for n = 1, we get the second scaling exponent: . One can continue along the same lines to obtain the asymptotic expansion of W (N ), which reads where c (l) j are the characteristic scaling exponents. The scaling expansion of log (l) W (N ) can be written It can be shown that the scaling exponents can be calculated from W (N ) as 3 For example, think of N as the number of particles in a system, or the number of throws in a coin tossing experiment. 4 Details about processes with reducing sample space can be found e.g., in Refs. [40,41,42,43].
As a next step, we apply the scaling expansion to obtain the corresponding extensive entropy functionals. It is well-known that for complex systems (with sub-or super-exponential phase space growth) the Shannon-Boltzman-Gibbs entropy is not an extensive quantity. To obtain an extensive expression for such systems, one can introduce an appropriate generalization of the entropy functional [1]. A natural way how to characterize thermodynamic entropy is to define the entropy functional S(W ) which is extensive. This requirement can be expressed for the microcanonical ensemble as S(W (N )) ∼ N for N → ∞. For the purpose of thermodynamics, we do not have to require exact extensivity (with equality sign), but only its weaker asymptotic version. We consider the general trace-form entropy functional The scaling expansion of the extensive entropy in the microcanonical ensemble can be expressed as and the scaling expansion of g(x) is The scaling coefficients d The requirement of extensivity determines the relation between scaling exponents c (l) Examples of systems with different scaling exponents. The first example is a random walk (RW) on the discrete one-dimensional lattice with two possible steps: left or right. The space of all possible paths grows exponentially, W RW (N ) = 2 N ∼ exp(N ), and we obtain the formula for Boltzmann entropy S RW = log W RW (k B = 1). Now consider an aging random walk (ARW) [19], where the walker takes one step in a random direction, followed by two steps into a random direction, followed by three steps, etc. In this case, the sample space grows sub-exponentially, W ARW ∼ 2 √ N /2 , and S ARW = (log W ARW ) 2 . The next example is the magnetic coin model (MC) [21], where each coin can be in two states: head or tail, however, two coins can also stick together and create a bond state. It can be shown that the corresponding sample space grows super-exponentially, can conclude that the corresponding extensive entropy is asymptotically equivalent to S M C = log W M C / log log W M C . Another example of super-exponential processes are random networks (RN), whose sample spaces grow as W RN = 2 ( N 2 ) , and thus, S RN = (log W RN ) 1/2 . The final example is the double-exponential growth of random walk cascade (RWC), where the walker can take a step to the right, to the left, or split into two independent walkers [22]. For this we get that W RW C = 2 2 N − 1, and, S RW C = log log W RW C . In Fig. 1 we show the parameter space of entropies given by three scaling exponents (d 0 , d 1 , d 2 ). The above examples are indicated as points. In Fig. 1 (a) the plane for the first two scaling exponents is shown, as presented in [18]. We see that if one uses only the first two exponents, some super-exponential processes are not properly represented. By adding a third scaling exponent this problem is solved Fig. 1 (b). So far, we have not yet found simple examples that need more than three scaling exponents.

Universality classes for scaling expansions
Scaling expansions define universality classes of statistical complex systems according to set of the scaling exponents of their sample space [22]. The representation of the sample space volume, W (N ), by its scaling expansion can be used to uniquely describe the statistical properties in the thermodynamic limit.
Consider a function c(x) represented by its scaling expansion Its scaling exponents can be collected in the scaling vector In principle, the scaling vector can be infinite, however, typically, after several terms the corrections are either zero, or do not contribute significantly. The parameter n denotes the number of corrections.
Let a(x) and b(x) be two functions with its respective scaling expansion determined by the two vectors of scaling exponents A = {l a ; a 0 , a 1 , . . . , a n } (11) Without loss of generality, n can be the same for both vectors because one can always append zeros to the shorter vector. We can now define the equivalence relation as well as natural ordering where the symbol < is used in the lexicographic meaning, i.e., For every vector C we define the corresponding entropy scaling vector D, denoted by D = C −1 , that is obtained from Eq. (8) by requirement of extensivity. One can define analogous relations for D through the relations for corresponding vectors C. Thus, for entropy scaling vectors E and F, we can say that Note that for sub-leading scaling exponents the inequality is reversed, which is the result of Eq. (8). Additionally, one can define basic algebraic operations on the scaling vectors, such as generalized addition or derivative operator. More details can be found in Appendix A. Let us make an important note. As discussed in [22], the SK axioms set requirements on the admissible set of scaling exponents. From SK2 we get that d l ≡ d This means that one can use the representation without specifying l with an appropriate number of zeros at the beginning. This is useful for example for the plots in the parametric space, where it is possible to plot processes of different order l (as e.g., in Fig. 1). However, one has to keep in mind that this representation can be misleading in the sense that the limit d l → 0 does not have a clear meaning, since it changes the order of the process. This can be nicely seen in the example of Tsallis entropy [6], where which can be formulated in terms of entropy scaling vectors for as Interestingly, the limit from above, q → 1 + , is even more pathological. In this case the scaling vector corresponding to S q (P ) for q > 1 is (0,0), because S q (N ) ∼ N 1−q + 1 ∼ N 0 . These pathologies have their origin in the non-commutativity of limits, lim N →∞ lim d l →0 = lim d l →0 lim N →∞ . The limit d l → 0 depends on the particular representation of the extensive entropy.
3 Information geometry of φ-deformations Information geometry plays a central role in theory of information as well as in statistical inference. It allows one to study the structure of the statistical manifold by means of differential geometry. We derive the information-geometric properties of the scaling expansion in the framework of φ-deformed logarithms introduced in [35,25]. The φ-deformation is a generalization of logarithmic functions. It can be subsequently used to establish a connection with information theory, where the logarithm plays the role of a natural information measure (Hartley information). The φ-deformed logarithm is defined by a positive, strictly increasing function φ(x), on (0, +∞) as Hence, log φ is an increasing concave function with log φ (1) = 0. For φ(x) = x, we obtain the ordinary logarithm. Naturally, The inverse function of log φ , the so-called φ-exponential, is an increasing and convex function. This enables one to define the parametric φ-exponential family of probability distributions as where the function Ψ (θ) is called the Massieu function and normalizes the distribution. As discussed in [26], there are two natural ways how to make a connection with the theory of information through the maximum entropy principle. The first is based on the maximization of the entropy functional under the linear (thermodynamic) constraints, the latter is based on a maximization under so-called escort (or geometric) constraints. Both approaches lead to the φ-exponential family. The former approach defines the φ-deformed entropy as [25] S which is maximized by the φ-exponential family for linear constraints, i.e., constraints of the type In information geometry, escort distributions play a special role of dual coordinates on statistical manifolds [33]. They can be defined by φ-deformations as It can be shown that the entropy maximized by the φ-exponential family for escort constraints, i.e., for constraints of the type can be expressed as For both approaches can be linked to information geometry, i.e., to derive a generalization of a Fisher information metric, which can be done through a divergence (or relative entropy) of Bregmann type, which is defined as where ·, · denotes the inner product. Alternatively, one can use the divergence of Csiszár type, but its information geometry is trivial, because it is conformal to ordinary Fisher information geometry, see e.g., Refs. [26,35]. Let us consider a parametric family of distributions p(θ). The Fisher information metric of this family at point θ 0 can be calculated as Let us consider a discrete probability distribution {p i } n i=0 . The normalization is given by n i=0 p i = 1, so we consider p i , . . . , p n as independent variables, while p 0 is determined from p 0 = 1 − n i=1 p i . We parameterize this probability simplex by a φ-deformed exponential family 5 . For the entropy S N φ , we have f N φ (p) = i pi 0 log φ (x) dx while for S A φ (p) we end with f (p) = i P θ i log φ (p i ). After a straightforward calculation, we obtain that [26] and respectively. As a result, for a given φ-deformation there are two types of metric on the information manifold. Note that it is natural to consider a one-parametric class of affine connections for which we obtain the so-called dually-flat structure for which the corresponding Christoffel coefficients vanish [33]. This structure is useful in information geometry, however, we stick to the well-known Levi-Civita connection (which can be obtained as a special case of a dually-flat connection, since the Levi-Civita connection is the only self-dual connection [4]), because the metric is nonvanishing. Thus, the corresponding invariants, such as scalar curvature, are non-trivial and reveal some information about the statistical manifold.
Let us now focus on the scalar curvature of corresponding to the metric tensor, R φ = g ik φ g lj φ R φ,ilkj , in the thermodynamic limit N → ∞. We focus on the microcanonical ensemble, i.e., we consider p i = 1/W . We assume no prior information about the system or its dynamics, so all states are equally probable.
It is possible to show in a technical but straightforward calculation that the scalar curvature can be expressed as (see also [38,39]) which corresponds to the scalar curvature of a W -dimensional ball of radius 2r φ . The function r φ depends only on the form of the φ-deformation. We call the function r φ characteristic length. For the case of the Amari metric, it can be expressed as while for the metric of Naudts type we obtain

Information geometry of scaling expansions
Let us now consider an arbitrary φ-deformed logarithm. We show how to introduce a generalization of the logarithm with a given asymptotic scaling. In contrast to φdeformations, we do not start with the definition of φ, but focus on the definition of the logarithm. We denote the desired logarithmic function as Λ D . Let us state the requirements that Λ D should fulfil: The requirements follow the properties of the ordinary logarithm. Particularly convenient is the self-duality requirement, from which we can directly calculate the asymptotic expansion around 0 + . A direct consequence of self-duality is that Λ D (1) = 0. Next, we want to find a representation that is simple, analytically expressible, and universal for any set of scaling exponents. Due to the self-duality requirement, we can focus only on the interval (1, +∞), while on the interval (0, 1) the logarithm is defined by the self-duality. To find an appropriate representation, we start from the scaling expansion itself. Unfortunately, the scaling expansion, is not generally defined on the whole interval (1, ∞), since the domain of log (l) (x) is (exp (l−2) (1), ∞). We can overcome this issue by adjusting the nested logarithm by replacing log → 1 + log. Further, to be able to fulfil the normalization condition, we add a multiplicative constant to the first nesting, so that for each order the corresponding term can be expressed as (1 + r j log([1 + log] (j−1) (x)). Thus, the generalized logarithm can be expressed as The logarithm automatically fulfils the condition Λ D (1) = 0. The parameters r n define the set of scale parameters that influence the behavior at finite values, while the asymptotic properties are preserved. Because we can obtain normalization of the derivative in several ways. For this we define the "calibration" where r and ρ are free parameters. The parameter ρ can be determined by additional requirements. The first option is to require that Λ D is smooth enough, at least it has continuous second derivative. From the second derivative of the self-duality condition together with the normalization condition, we get Λ D (1) = −1. Following a straightforward calculation, we find Using Eq. (37) in Eq. (40), we get an expression for ρ C , i.e., the scale parameter in the smooth calibration (41) The free parameter r can be used to ensure that ρ is positive. Alternatively, we can simply consider r 0 = 1, which is useful for several applications. In this case we get that ρ L scale parameter in the leading-order calibration is simply Note that after a proper normalization, this calibration corresponds to the calibration used in [18,19]. Unless a continuous second derivative is explicitly required, it is more convenient to work with this simpler calibration. We now turn our attention to the information geometry of Λ D -deformations and introduce a notation for the nested logarithm We sketch the results on for the scaling expansion with one correction. All technical details can be found in Appendix B. In Appendix C we show the calculation for arbitrary scaling vectors and calibrations, which is technically more difficult, but leads to the same results. We now denote the scaling vector as D = (l; c, d). Note that this entropy has been studied for l = 0 in [18]. This inspires us to define the generalized logarithm as This definition corresponds to the choice of ρ in Eq. (42) 6 . The logarithms are depicted in Fig. 2(a) for various scaling exponents. The inverse function, the deformed exponential, can be obtained in terms of the Lambert-W function 7 where B = cr 1−cr exp cr

1−cr
and ν l is the inverse function of µ l , i.e., Note that depending on the values of c and d this deformed exponential contains the exponential, power laws, and stretched exponentials, respectively [1]. It is easy to see that the corresponding scaling vector of the exponential is C = (l; 1/c, −d/c). The function φ (l;c,d) (x) can be expressed as The escort distribution, ρ (l;c,d) (p) = φ (l;c,d) (p)/(φ (l;c,d) (p) + φ (l;c,d) (1 − p)), corresponding to the two-event distribution (p, 1 − p) is depicted in Fig. 2(b) for various scaling exponents. Interestingly, for D < (1; 1), i.e., for entropies corresponding to subexponential sample space growth, the distribution shows high probabilities (generally p > 1/N ), while for D > (1; 1), i.e., for super-exponential growth, the distribution shows low probabilities (p < 1/N ). Let us finally show the asymptotic behavior of the curvature that corresponds to the deformed logarithm. It can be easily calculated if one keeps only dominant contributions from each term in the Eqs. (33) and (34). In this case we have and therefore For the case l = 1 and c = 1, we can make a similar approximation to get Similar results can be obtained for higher-order corrections. The behavior of r for different scaling vectors is depicted in Fig. 3. We see that the asymptotic behavior is similar for both types of curvature, the only difference is for smaller N . Similarly, we obtain the same behavior also for higher-order corrections (see Appendix C). In conclusion, we find three distinct regimes for the statistical manifold with respect to the scaling vector As a result, the curvature exhibits a phase transition -the statistical manifold in thermodynamic limit is flattening for sub-exponential processes, has constant section curvature for exponential processes, and is curving for super-exponential processes. While processes with exponentially growing sample space have (practically) independent sub-systems, sub-exponential processes impose some restrictions and constaints on the sample space. Super-exponential processes are characterized by emergent structures of its sample space. The scaling vector plays the natural role of the set of order parameters. Let us finally note that the limit W → ∞ is performed for r D (W ). The "limit space" obtained in the limit of the statistical manifolds, for W → ∞, might not be a smooth manifold and the curvature might not correspond to the limit lim W →∞ R D (W ).

Conclusions and Perspectives
In this paper, we have defined a class of deformed logarithms with a given scaling expansion in the framework of φ-deformed logarithms. The corresponding entropy can be used to define the statistical manifold with generalized Fisher-Rao metric. We have shown that for the microcanonical ensemble in the thermodynamic limit, the scalar curvature exhibits a phase transition where the critical point is represented by the class of phenomena that are characterized by exponentially growing phase spaces. These include weakly interacting systems that are correctly described by Shannon entropy. The scaling vector of a given system naturally defines a set of order parameters. A possible explanation for this phenomenon is that the number of independent degrees of freedom grows slower than the size of the system for sub-exponential processes and faster for the super-exponential processes. This classification, however, does not appear for the case of the Fisher metric of Csiszár type, since the characteristic length is constant for every φ-deformation.
Contrary to common approach in information geometry, where the statistical manifold corresponds to one functional family of distributions (e.g., exponential family of distributions), this paper presents a parametric way how to switch between different functional families of distributions (e.g., from power-laws to stretched exponentials). This opens a novel connection between parametric and non-parametric information geometry and enables to classify different types of statistical manifolds related to various classes of deformed exponential families.
It will be natural to extend these results to generalizations of Bregmann divergence enabling gauge invariance [44]. Moreover we will focus on application of the results to the canonical ensemble and use the well-known results using Fisher information metric on the thermodynamic manifold [45,46] for the case of complex systems, where we need to use the generalized form of the Boltzmann factor [47]. Moreover, it should also be possible to go beyond equilibrium statistical mechanics and extend the generalized Fisher metric to non-equilibrium scenarios [48].

A Basic algebra of scaling vectors
Let us discuss some definitions of ordinary operations on the space of scaling exponents. First, let us introduce a truncated vector of the scaling vector defined in Eq. (10) as where k ≤ n. Then, we can introduce Let us also add one set of inequality relations, and particularly for the case, when even the order l is not equal. For this we define -Strong inequality relation: Let us investigate representations of basic operations on the space of scaling exponents. Before that let us define the rescaling of the general operator O : Let us now denote the generalized addition as a(x) ⊕ (l) b(x) and multiplication as a(x) ⊗ (l) b(x). It is easy to show that Let us now consider, without loss of generality, that a(x) ≺ b(x). The scaling vector C of c(x) = a(x) ⊗ (l) b(x) can be expressed as follows: for l < l a ≤ l b or l = l a < l b ; undefined, for l > l a .
The scaling vector C of the generalized composition c(x) = exp (l) b(log (l) a(x)) can be expressed as . . , a n b 0 ), for l a = l; 1 (l b ) A = (l a + l b ; a 0 , a 1 , a 2 , . . . , a n ), for l < l a ; undefined, for l > l a .
Finally, let us focus on the derivative of the scaling expansion. Let us denote the rescaled derivative operator as The scaling vector corresponding to the rescaled derivative is (l a ; a 0 − 1, a 1 , a 2 , . . . , a n ), for l a = l; A, for l a > l; (l; −1, . . . , −1 for l a < l. B Asymptotic curvature of (l; c, d)-logarithm In this appendix, we calculate asymptotic properties of the (l; c, d) logarithm. Let us first express the derivatives of (l; c, d) logarithm in terms of (c, d) logarithm and µ l : The derivatives of the nested logarithm µ l (x) = [1 + log] (l) (x) can be expressed as: Let us then focus on the situation l = 1, c = 1. In this case the leading order terms cancel and we have to look at the first correction given by the scaling exponent d. In this case So the curvature of both Amari and Naudts type can be asymptotically expressed as C Fisher metric and scalar curvature corresponding to general logarithm Let us now show the full calculation of the scalar curvature corresponding to the Λ D -logarithm with arbitrary scaling vector D and constants r j . Let us first recall the product rule for higher derivatives. The first three derivatives of a function Λ D (x) = The derivatives of λ j can be expressed by defining function L Lj (x) = 1 (1 + rj log µ j+l−1 (x)) j+l−1 k=0 µ k (x) = µ j+l (x) (1 + rj log µ j+l−1 (x)) .
Then we can express λ j (x) = λj (x) (rj dj Lj (x)) , λ j (x) = λj (x) r where the coefficients C can be expressed as where Ai(x) =   (ri + (1 + ri log µ i+l−1 (x))) Finally, we plug the expressions for Λ D and its derivatives into Eqs. (33,34), and we end with and