Laws of Zipf and Benford, intermittency, and critical fluctuations

We describe precise equivalences between theoretical descriptions of: (i) size-rank and first-digit laws for numerical data sets, (ii) intermittency at the transition to chaos in nonlinear maps, and (iii) cluster fluctuations at criticality. The equivalences stem from a common statistical-mechanical structure that departs from the usual via a one-parameter deformation of the exponential and logarithmic functions. The generalized structure arises when configurational phase space is incompletely visited such that the accessible fraction has fractal properties. Thermodynamically, the common focal expression is an (incomplete) Legendre transform between two entropy (or Massieu) potentials. The theory is in quantitative agreement with real size-rank data and it naturally includes the bends or tails observed for small and large rank.

The empirical laws of Zipf and Benford enjoy a unique place in the field of study of complex systems due to their combined omnipresence and simplicity. Zipf's law refers to the (approximate) power law that is displayed by sets of data when these are given a ranking in relation to magnitude or rate of recurrence [1]. Benford's law is a simple logarithmic rule for the frequency of first digits found also in numerical data sets [2]. An explanation of their apparent universality has been offered recently [3,4] by exhibiting an underlying statistical-mechanical structure obeyed by the quantities employed in describing the empirical laws. Additionally, the same theoretical structure was found to hold in a very different problem, that of the simplest of the socalled routes out of chaos in low-dimensional mappings, that via intermittency. At this transition from chaos to regular behavior, sequences of numerical values, called trajectories, obtained by consecutive iterations of a nonlinear map, transform from irregular to periodic in a particular manner that consists of approximately regular patterns, laminar episodes, separated by bursts of disorder [5]. Notably, there is a precise equivalence between this transition to chaos, known technically as the tangent bifurcation, and the ranksize and first-digit laws [4]. Furthermore, earlier work [6,7] has established the implications of yet another strict analogy between problems in different fields, the same tangent bifurcation in nonlinear dynamics and the nature of the transitory clusters of order parameter (e.g. magnetization) that occur at ordinary critical points in thermal systems. Here we put together these remarkable and exact analogies to emphasize the fact that the parallelisms amongst these apparently very dissimilar problems stem from a common kind of statistical mechanics that arises when configurational phase space is incompletely visited in a strict way. Specifically, the restriction is that the accessible fraction of this space has fractal properties.
Thus, in the next section 1 we reproduce the theoretical expressions [3,4] for the size-rank and first-digit laws relevant to our purposes and describe the generalized statistical-mechanical structure we observe in them. In the following section 2 we present the parallelism between the ranking of data and the dynamics at the tangent bifurcation in nonlinear maps and describe the finite-size effect of the former in terms of the off-tangency feature of the latter [4]. In section 3 we recall the theoretical properties of large clusters of order parameter at critical points in thermal systems and highlight their association to the properties of both the trajectories at the tangent bifurcation and the size-rank functions [6,7]. We conclude in section 4 with a short summary and discussion.

The laws of Benford and Zipf from an statistical-mechanical viewpoint
The relationship between the laws of Benford and Zipf was first analyzed in [8]. Benford's law, p(n) = log(1+n 1 ), where p(n) is the relative probability for occurrence of digit n, is obtained by integration between consecutive digits n and n+1 of P(N)~N 1 , the underlying probability distribution for the data N under consideration. The first step in [8] consisted of the consideration of the more general case P(N)~N  ,   >1. The next step was to obtain the rank k from P(N), this time as an integration over P(N) from N(k), the number of data that define the rank k, to a finite number N max that corresponds to the first value of the rank k. In the limit N max → ∞ one obtains N(k)~k 1/(1) that is Zipf's law with exponent 1/(  1) when   >1. For many sets of real data ≈2 and the standard Zipf law is   = 2 [8].
In our work [3,4], the first, minor, step was to keep N max finite, but this led us to a basic conclusion on a possible physical origin of the laws of Zipf and Benford. That is, these laws represent general thermodynamic relations that belong to a special type of thermodynamic structure obtained from the usual via a scalar deformation parameter represented by the power . A further generalization of the already generalized form of Benford's law in [8], not specialized to a first or other digit, but to the numbers N(k) and N max , is interpreted as an (incomplete) Legendre transform (like a Landau free energy or a free energy density functional) between two thermodynamic potentials. The expression relating the corresponding partition functions becomes a generalized Zipf's law. We identify these quantities in terms of the variables involved as well as the conjugate variables in the transform, which are the rank k and the inverse of the total number of data, 1/N. We also argued that this kind of deformed thermodynamics arises from the existence of a strong barrier to enter configurational phase space, which leads to only a fractal or multifractal subset of this space being accessible to the system. A quantitative consequence of considering N max finite is the reproduction of the small-rank bend displayed by real data before the power-law behavior sets in. The power-law regime in the theoretical expression persists up to infinite rank k→∞, representative of a sort of 'thermodynamic limit'.
More explicitly, the probability of observation of the first digit n of the number N distributed according to P(N)~N  is given by This is a generalization of Benford's law [8]. The rank k for a set of N numbers extracted from the basic distribution P(N)~N α is given by [8] max ( ) 1 1 max where N max and N(k) correspond, respectively, to rank k = 0, and nonspecific rank k > 0. Eq. (2) introduces a continuum space variable for the rank k and therefore the first value of the rank is k = 0. Inversion of the above in the limit N max >>1 yields Zipf's law N(k)~k 1/(1) [8].
As a convenient shorthand notation consider the q-de- with q≠1 a real number, and its inverse, the q-deformed exponential function exp q (x)  [1+(1q)x] 1/(1q) that reduce, respectively, to the ordinary logarithmic and exponential functions when q = 1. In terms of these functions, eq. (2) and its inverse can be written more concisely as As shown in [3,4], eq. (4) is a generalization of Zipf's law that is capable of reproducing quantitatively the behavior for small rank k observed in real data where, as one would expect, N max is finite.
In Figure 1 we compare the numbers of occurrences of English words in a corpus with N(k) as given by eq. (4), where the reproduction of the small-rank bend displayed by the data before the power-law behavior sets in is evident. In the theoretical expression this regime persists up to infinite rank k→∞. Alternatively, we recover from eq. (4) the power law N(k)~k 1/(1) in the limit N max >>1,   >1.
A key physical interpretation of eq. (3) was put forward in [3,4]. This is based on the fact that both log α N max and log α N(k) are given by the integrals and these in turn can be seen, when   =1, to conform to the evaluation of entropy 1 S = logN max or S 1 = logN(k) where the probability of N equally-probable configurations in phase space is P(N) = N 1 . The same interpretation was extended to the case   >1 by considering that P(N) = N  represents the probability of N equally-probable configurations in a strongly restricted phase space. The entropies are written now as where N max and N(k) play the roles of total configurational numbers or partition functions. Eq. (3) is rewritten as and

The laws of Benford and Zipf from the perspective of nonlinear dynamics
To make explicit the analogy between the generalized laws of Benford and Zipf and the nonlinear dynamics of intermittency [4], it is necessary to recall briefly the Renormalization Group (RG) treatment of the tangent bifurcation that mediates the transition between chaotic and periodic attractors [5]. The common procedure to study the transition to chaos from a trajectory of period n starts with the n-th composition f (n) (x) of a one-dimensional map f (x) at such bifurcation, followed by an expansion for the neighborhood of one of the n points tangent to the line with unit slope [5]. With complete generality one obtains ( ) ( ) ..., 0, 1, together with a specific value for  that upon expansion around x = 0 reproduces eq. (9). An exact analytical expression for f * (x) was obtained in [10] with the use of the assumed translation property of an auxiliary variable, y = x 1z . This property is written as or, equivalently, as It is straightforward to corroborate that x′ = f * (x) as given by eq. (12) satisfies eq. (10) with  = 2 1/(z1) . Repeated iteration of eq. (11) leads to   So that the iteration number or time t dependence of all trajectories is given by where x 0 is the initial position. The q-deformed properties of the tangent bifurcation are discussed at greater length in [11]. The parallel between eqs. (14) and (15) with eqs. (3) and (4), respectively, is clear, and therefore we conclude that the dynamical system represented by the fixed-point map f * (x) operates in accordance to the same statisticalmechanical property described in the previous section for the generalized laws. We notice that the absence of an upper bound for the rank k in eqs. (3) and (4) is equivalent to the tangency condition in the map. Accordingly, we look at the changes in N(k) brought about by shifting the corresponding map from tangency (Figure 2), i.e. we consider the trajectories x t with initial positions x 0 of the map with the identifications k = t, N 1 = , N(k)=x t + x * , N max = x 0 + x * and  = z, where the translation x * ensures that all N(k)0. In Figure 3 we illustrate the capability of this approach to reproduce quantitatively real data for ranking of eigenfactors (a measure of the overall value) of physics journals (http://www.eigenfactor.org/index.php).
In the intermittency route out of chaos it is relevant to determine the duration of the so-called laminar episodes [5], i.e. the average time spent by the trajectories going through the "bottleneck" formed in the region where the map is closest to the line of unit slope. Naturally, the duration of the laminar episodes diverges at the tangent bifurcation when the Lyapunov exponent for separation of trajectories vanishes. Interestingly, it is this property of the nonlinear dynamics that translates into the finite-size properties of the occurrence-rank function N(k). One more important result that follows from the analogy between nonlinear dynamics and the rank law is that the most common value for the degree of nonlinearity at tangency is z = 2, obtained when the map is analytic at x = 0 with nonzero second derivative, and this implies   = 2, close to the values observed for most sets of real data.

The laws of Benford and Zipf in relation to critical phenomena
The local fluctuations of a thermal system undergoing a second-order phase transition are represented in a classical spin model by the deviations of the magnetization  from its equilibrium vanishing average. These fluctuations generate transient magnetic domains, or critical clusters, on all size scales whose combined behavior produce the familiar properties of critical phenomena. Critical clusters have been studied (see [6] and references therein) with the use of the statistical-mechanical method of a coarse-grained free energy, like the Landau-Ginzburg-Wilson (LGW) continuous spin model Hamiltonian, for the critical temperature and zero external field. The cluster of radius R has a fractal configuration whose amplitude  grows in time and eventually collapses when the instability is reached. This process is described by a nonlinear map with tangency and feedback features, such that the time evolution of the cluster is given in the nonlinear system as a laminar episode of intermittent dynamics (see [6] and references therein). The coarsegrained phase space for a critical cluster is the set of possible values 0       1 for the magnetization, and in a one-dimensional system (with no loss of generality as the range of interactions can be adjusted to ensure criticality) the cluster is described by the profile   (x) for each position x. Also, it is sufficient to study a symmetrical cluster, The basic results can be more readily obtained via the following phenomenological arguments [7]. Firstly, assume that the fractal properties of the cluster imply an under-occupation of the coarse-grained phase space. This sub-occupation makes the ordinary entropy where  R    (R), to be nonextensive. That is, S 1 (R) does not grow linearly with R because the total magnetization 0 ( ) 2 ( ), where d f <1 is the fractal dimension of the cluster [6,7]. To restore the extensivity of the entropy we replace eq. (17) by where log q is, as above, the q-logarithmic function, and tune q to that specific value q > 1 that makes S q (R) grow linearly with R. This is equivalent to the replacement , Similarly, we write the entropy expression 0 (0) log , Eq. (26) relates the values of the magnetization at the center and at the edge of the critical cluster. This is the main consequence we obtain from the above simple assumptions. The expression can be generalized to describe the magnetization profile  (x) of the entire cluster. Furthermore, eq. (26) can be derived [6,7] from the use of a LGW Hamiltonian within the saddle-point approximation. This more fundamental approach allows us to determine the parameters q and  in terms of Hamiltonian variables. In particular, the index q is given by q=(  +1)/2 where  is the isothermal critical exponent [6].
Comparison of eqs. (25) and (26) with eqs. (14) and (15), respectively, and with eqs. (3) and (4), respectively, demonstrates the equivalence between the properties of size-rank functions, dynamics at the intermittency transition out of chaos, and the properties of critical clusters.
Consequently, these three apparently different problems share the same statistical-mechanical interpretation. It should be noted however that the cluster magnetization  R above is defined only for positive values of R and therefore this function coincides with the trajectory x t at or close to the tangent bifurcation with its positive portion x t   0, i.e. a trajectory out of tangency or past the midpoint of the "bottleneck".
Similarly  R translates only to the portion of the sizerank function N(k) that corresponds to the finite-size induced down-bend that follows for large rank after the approximate power-law regime.

Summary and discussion
We have described the equivalence of the theoretical treatments for three topics or issues encountered in different fields: (i) the empirical laws of Zipf and Benford for diverse sets of numerical data; (ii) the dynamics associated to the intermittency route to chaos in nonlinear maps; and (iii) the nature of large critical clusters in many-body systems. These solid analogies are based on having in common the main expressions for their description and that these in turn share the same source for their derivation. We interpreted these expressions as the manifestation of a thermodynamic, or statistical-mechanical, structure that arises from a restriction in these systems to sample a large amount of configurations. More specifically, the theoretical expressions for the firstdigit and size-rank laws, eqs. (1) and (2), were derived in [8] under the basic assumption that the data sets obeyed by them are statistically well reproduced when extracted from a power law distribution P(N)~N  . That is, the deviation from unity of the exponent  implies a restricted access to the phase space for data configurations which when enumerated produce the numbers N. The restriction involves an accessible subset of this space with a scale invariant property, i.e., a fractal set, as indicated by the power law N  . Our statistical-mechanical viewpoint becomes evident when P(N) is seen to represent the probability distribution of N equally-probable configurations in the phase space for the data, and, consequently, suggests the definition of the generalized entropies in eq. (6). The straightforward transformation of the pair of eqs. (3) and (4) into either eqs. (14) and (15) or eqs. (25) and (26) reveals the fact that the statistical-mechanical interpretation applies also to the critical nonlinear dynamics at the tangent bifurcation and to the growth and collapse of the critical clusters at second order transitions. The degree of deformation, α, z, or q, in the theory is an important index that characterizes the phase-space restriction. For instance, when   = z = q = 2 one has: (i) the standard value for the power-law decay associated to Zipf's law, minus one; (ii) the most general situation for a map at tangency, nonzero curvature; and (iii) the classical value for the critical isothermal exponent,  = 3. More details and more ample discussions of the results commented here are found in [4,6,7,11].