A functional equation related to generalized entropies and the modular group

We solve a functional equation connected to the algebraic characterization of generalized information functions. To prove the symmetry of the solution, we study a related system of functional equations, which involves two homographies. These transformations generate the modular group, and this fact plays a crucial role in solving the system. The method suggests a more general relation between conditional probabilities and arithmetic.


Motivation and results
In this paper, we study the measurable solutions u : [0, 1] → R of the functional equation for all x, y ∈ [0, 1) such that x + y ∈ [0, 1]. The parameter α can take any positive real value. This equation appears in the context of algebraic characterizations of information functions. Given a random variable X whose range is a finite set E X , a measure of its "information content" is supposed to be a function The most important example of such a function is the Shannon-Gibbs entropy where 0 log 0 equals 0 by convention. Shannon entropy satisfies a remarkable property, called the chain rule, that we now describe. Let X (resp. Y ) be a variable with range E X (resp. E Y ); both E X and E Y are supposed to be finite sets. The couple (X, Y ) takes values in a subset E XY of E X × E Y , and any probability p on E XY induce by marginalization laws X * p on E X and Y * p on E Y . For instance, X * p(x) = y:(x,y)∈EXY p(x, y). (1.4) The chain rule corresponds to the identities where p| X=x denotes the conditional probability y → p(y, x)/X * p(x). These identities reflect the third axiom used by Shannon to characterize an information measure H: "if a choice be broken down into two successive choices, the original H should be the weighted sum of the individual values of H" [7]. There is a deformed version of Shannon entropy, called generalized entropy of degree α [1,Ch. 6]. For any α ∈ (0, ∞) \ {1}, it is defined as A. f [X](δ) = 0 whenever δ is any Dirac measure-a measure concentrated on a singleton-, which means that variables with deterministic outputs do not give (new) information when measured; B. the generalized α-chain rule holds, i.e. for any variables X and Y with finite range 1 The simplest non-trivial case corresponds to The equality between the right-hand sides of (1.9) and (1.10) reads and when α = 1.
The first is proved analytically, by means of standard techniques in the field of functional equations (cf. [1,5,9]), and the second by a novel geometrical argument, relating the equation to the action of the modular group on the projective line. Theorems 1.2 and 1.3 above imply that any measurable solution u of (1.1) must be symmetric, i.e. u(x) = u(1 − x) for all x ∈ [0, 1], and therefore whenever x, y ∈ [0, 1) and x + y ∈ [0, 1]. When α = 1, this equation is called "the fundamental equation of information theory"; it first appeared in the work of Tverberg [9], who deduced it from a characterization of an "information function" that not only supposed a version of the chain rule, but also the invariance of the function under permutations of its arguments. Daróczy introduced the fundamental equation for general α > 0, and showed that it can be deduced from an axiomatic characterization analogue to that of Tverberg, that again supposed invariance under permutations along with a deformed chain rule akin to (1.8), see [3,Thm. 5].
For α = 1, Tverberg [9] showed that, if u : [0, 1] → R is symmetric, Lebesgue integrable and satisfies (1.12), then it must be a multiple of s 1 (x). In [5], Kannappan and Ng weakened the regularity condition, showing that all measurable solutions of (1.12) have the form u(x) = As 1 (x) + Bx (where A and B are arbitrary real constants), which reduces to u(x) = As 1 (x) when u is symmetric. In fact, they solved some generalizations of the fundamental equation, proving among other things that, when α = 1, the only measurable solutions of (1.1) are multiples of s 1 (x).
For α = 1, Daróczy [3] established that any u : [0, 1] → R that satisfies (1.12) and u(0) = u(1) has the form 2 without any hypotheses on the regularity of u. The proof starts by proving that any solution of (1.12) must satisfy u(0) = 0 (setting x = 0), and hence be symmetric (setting y = 1 − x). Since we are able to prove symmetry of the solutions of (1.1) restricted to rational arguments without any regularity hypothesis, we also get the following result. Proof. Set x = 0 in (1.1) to conclude that u(1) = 0, and y = 0 to obtain u(0) = 0. Moreover, u must be symmetric (Theorem 1.3), hence it must fulfill (1.12) when the arguments are rational. Given these facts, Daróczy's proof in [3, p. 39] applies with no modifications when restricted to p, q ∈ Q.
More details on the characterization of information functions by means of functional equations can be found in the classical reference [1], which gives a detailed historical introduction. Reference [2] summarizes more recent developments in connection with homological algebra.
It is quite remarkable that Theorem 1.1 serves as a fundamental result to prove that, up to a multiplicative constant, {S α [X]} X∈S is the only collection of measurable functionals (not necessarily invariant under permutations) that satisfy the corresponding α-chain rule, for any generic set of random variables S. In order to do this, one introduces an adapted cohomology theory, called information cohomology [2], where the chain rule corresponds to the 1-cocycle condition and thus has an algebro-topological meaning. The details can be found in the dissertation [10].

The modular group
The group G = SL 2 (Z)/{±I} is called the modular group; it is the image of SL 2 (Z) in P GL 2 (R). We keep using the matrix notation for the images in this quotient. We make G act on P 1 (R) as follows: an element

Regularity: proof of Theorem 1.2
Lemma 3 in [5] implies that u is locally bounded on (0, 1) and hence locally integrable. Their proof is for α = 1, but the argument applies to the general case with almost no modification, just replacing which is evidently valid too whenever x, y ∈ (0, 1).
To prove the differentiability, we also follow the method in [5]-already present in [9]. Let us fix an arbitrary y 0 ∈ (0, 1); then, it is possible to chose s, t ∈ (0, 1), s < t, such that for all y in certain neighborhood of y 0 . We integrate (1.1) with respect to x, between s and t, to obtain (3.1) The continuity of the right-hand side of (3.1) as a function of y at y 0 , implies that u is continuous at y 0 and therefore on (0, 1). The continuity of u in the right-hand side of (3.1) implies that u is differentiable at y 0 . An iterated application of this argument shows that u is infinitely differentiable on (0, 1).

Symmetry: proof of Theorem 1.3
Define the function h : Observe that h is anti-symmetric around 1/2, that is, we have Let now z ∈ 1 2 , 1 be arbitrary and use the substitutions x = 1 − z and y = 1 − z in (1.1) to derive the identity Using the anti-symmetry of h to modify the right-hand side of the previous equation, we also deduce that Setting x = 0 (respectively y = 0) in (1.1), we to conclude that u(1) = 0 (resp. u(0) = 0). Hence, the function h is subject to the boundary conditions We establish first the anti-symmetry around 1/2 of the extended h (Lemma 4.2), which implies that (4.7) follows from (4.6); the latter is a consequence of Lemmas 4.3-4.7.
Proof. For h is periodic, (4.8) is equivalent to (4.9) and the change of variables u = x − 1 gives This establishes (4.10).
Proof. Immediately deduced from the previous lemma using the anti-symmetric property in Lemma 4.2.
. On the other, for x ≤ 0, the preceding results The transformations x → 2x−1 x and x → 1−x x in Eqs. (4.6) and (4.7) are homographies of the real projective line P 1 (R), that we denote respectively by α and β. They correspond to elements in G, that satisfy This last matrix corresponds to x → 1 − x. Proof. Let

One has
and Therefore, P AP −1 = T −1 and S = T −3 P B −2 P −1 . Inverting these relations, we obtain (4.22) Let X be an arbitrary element of G. Since Y = P XP −1 ∈ G and G is generated by S and T , the element Y is a word in S and T . In consequence, X is a word in P −1 SP and P −1 T P , which in turn are words A and B 2 .
It is possible to find explicit formulas for S and T in terms of A and B 2 . Since P = S −1 T −1 , we deduce that P SP −1 = S −1 T −1 ST S and P T P −1 = S −1 T −1 T T S = S −1 T S. Hence, in virtue of (4.22), To finish our proof of Proposition 1.3, we remark that the orbit of 0 by the action of G on . This is a consequence of Bezout's identity: for every point [p : q] ∈ P 1 (R) representing a reduced fraction p q = 0 (p, q ∈ Z \ {0} and coprime), there are two integers x, y such that xq − yp = 1.
Proof. Since the orbit in P 1 (R) of 1/2 by the group of homographies generated by A and B 2 (i.e. G itself) contains the whole set of rational numbers Q, there exists a w such that r = w n • · · · • w 1 (1/2), where each w i equals α, β or one of their inverses.
If some iterate equals 0 or ∞, the sequence w can be modified to avoid this. Let i ∈ {0, ..., n} be the largest index such that x i ∈ {0, ∞}; in fact, i < n because r = 0, ∞.