# A functional equation related to generalized entropies and the modular group

## Abstract

We solve a functional equation connected to the algebraic characterization of generalized information functions. To prove the symmetry of the solution, we study a related system of functional equations, which involves two homographies. These transformations generate the modular group, and this fact plays a crucial role in solving the system. The method suggests a more general relation between conditional probabilities and arithmetic.

## Motivation and results

In this paper, we study the measurable solutions $$u:[0,1]\rightarrow {\mathbb {R}}$$ of the functional equation

\begin{aligned} u(1-x)+(1-x)^\alpha u\left( \frac{y}{1-x}\right) = u(y) + (1-y)^\alpha u\left( \frac{1-x-y}{1-y}\right) \end{aligned}
(1.1)

for all $$x,y \in [0,1)$$ such that $$x+y\in [0,1]$$. The parameter $$\alpha$$ can take any positive real value.

This equation appears in the context of algebraic characterizations of information functions. Given a random variable X whose range is a finite set $$E_X$$, a measure of its “information content” is supposed to be a function $$f[X]: \Delta (E_X) \rightarrow {\mathbb {R}}$$, where $$\Delta (E_X)$$ denotes the set of probabilities on $$E_X$$,

\begin{aligned} \Delta (E_X) =\left\{ \,p:E_X\rightarrow [0,1] \, \big \vert \, \sum _{x\in E_X}p(x) = 1\,\right\} . \end{aligned}
(1.2)

The most important example of such a function is the Shannon-Gibbs entropy

\begin{aligned} S_1[X](p) := -\sum _{x\in E_X} p(x)\log p(x), \end{aligned}
(1.3)

where $$0\log 0$$ equals 0 by convention.

Shannon entropy satisfies a remarkable property, called the chain rule, that we now describe. Let X (resp. Y) be a variable with range $$E_X$$ (resp. $$E_Y$$); both $$E_X$$ and $$E_Y$$ are supposed to be finite sets. The couple (XY) takes values in a subset $$E_{XY}$$ of $$E_X\times E_Y$$, and any probability p on $$E_{XY}$$ induce by marginalization laws $$X_*p$$ on $$E_X$$ and $$Y_*p$$ on $$E_Y$$. For instance,

\begin{aligned} X_*p(x) = \sum _{y: (x,y)\in E_{XY}} p(x,y). \end{aligned}
(1.4)

The chain rule corresponds to the identities

\begin{aligned} S_1[(X,Y)](p)&= S_1[X](X_*p) + \sum _{x\in E_X} X_*p(x) S_1[Y](Y_*(p|_{X=x})), \end{aligned}
(1.5)
\begin{aligned} S_1[(X,Y)](p)&= S_1[Y](Y_*p) + \sum _{y\in E_Y} Y_*p(y) S_1[X](Y_*(p|_{Y=y})), \end{aligned}
(1.6)

where $$p|_{X=x}$$ denotes the conditional probability $$y\mapsto p(y,x)/X_*p(x)$$. These identities reflect the third axiom used by Shannon to characterize an information measure H: “if a choice be broken down into two successive choices, the original H should be the weighted sum of the individual values of H” [7].

There is a deformed version of Shannon entropy, called generalized entropy of degree $$\alpha$$ [1, Ch. 6]. For any $$\alpha \in (0,\infty )\setminus \{1\}$$, it is defined as

\begin{aligned} S_\alpha [X](p) := \frac{1}{1-\alpha }\left( \sum _{x\in E_X} p(x)^\alpha -1\right) . \end{aligned}
(1.7)

This function was introduced by Havrda and Charvát [4]. Constantino Tsallis popularized its use in physics, as the fundamental quantity of non-extensive statistical mechanics [8], so $$S_\alpha$$ is also called Tsallis $$\alpha$$-entropy. It satisfies a deformed version of the chain rule:

\begin{aligned} S_\alpha [(X,Y)](p) = S_\alpha [X](X_*p) + \sum _{x\in X} (X_*p(x))^\alpha S_\alpha [Y](Y_*(p|_{X=x})). \end{aligned}
(1.8)

Suppose now that, given $$\alpha >0$$, we want to find the most general functions f[X]—for a given collection of finite random variables X—such that

1. A.

$$f[X](\delta )=0$$ whenever $$\delta$$ is any Dirac measure—a measure concentrated on a singleton—, which means that variables with deterministic outputs do not give (new) information when measured;

2. B.

the generalized $$\alpha$$-chain rule holds, i.e. for any variables X and Y with finite rangeFootnote 1

\begin{aligned} f[(X,Y)](p)&= f[X](X_*p) + \sum _{x\in E_X} (X_*p(x))^\alpha f[Y](Y_*(p|_{X=x})), \end{aligned}
(1.9)
\begin{aligned} f[(X,Y)](p)&= f[Y](Y_*p) + \sum _{y\in E_Y} (Y_*p(y))^\alpha f[X](Y_*(p|_{Y=y})). \end{aligned}
(1.10)

The simplest non-trivial case corresponds to $$E_X=E_Y=\{0,1\}$$ and $$E_{XY}=\{(0,0),(1,0),(0,1)\}$$; a probability p on $$E_{XY}$$ is a triple $$p(0,0)=a$$, $$p(1,0)=b$$, $$p(0,1)=c$$, such that $$X_*p=(a+c,b)$$ and $$Y_*p=(a+b,c)$$. The equality between the right-hand sides of (1.9) and (1.10) reads

\begin{aligned}&f[X](a+c,b) + (1-b)^\alpha f[Y]\left( \frac{a}{1-b},\frac{c}{1-b}\right) \nonumber \\&\quad =f[Y](a+b,c)+(1-c)^\alpha f[X]\left( \frac{a}{1-c},\frac{b}{1-c}\right) , \end{aligned}
(1.11)

for any triple $$(a,b,c)\in [0,1]^2$$ such that $$a+b+c=1$$. Setting $$a=0$$ and using assumption A, we conclude that $$f[X](c,1-c)=f[Y](1-c,c)=: u(c)$$ for any $$c\in [0,1]$$. Therefore, (1.11) can be written in terms of this unique unknown u; if moreover we set $$c=y$$, $$b=x$$ and consequently $$a=1-x-y$$, we get the functional equation (1.1), with the stated boundary conditions.

### Theorem 1.1

Let $$\alpha$$ be a positive real number. Suppose $$u:[0,1]\rightarrow {\mathbb {R}}$$ is a measurable function that satisfies (1.1) for every $$x,y \in [0,1)$$ such that $$x+y\in [0,1]$$. Then, there exists $$\lambda \in {\mathbb {R}}$$ such that $$u(x)=\lambda s_\alpha (x)$$, where

\begin{aligned} s_1(x) = -x \log _2 x -(1-x)\log _2 (1-x) \end{aligned}

and

\begin{aligned} s_\alpha (x) =\frac{1}{1-\alpha }(x^\alpha + (1-x)^\alpha -1) \end{aligned}

when $$\alpha \ne 1$$.

By convention, $$0\log _2 0 := \lim _{x\rightarrow 0} x \log _2 x = 0$$. For $$\alpha =1$$, Theorem 1.1 is essentially Lemma 2 in [5]. Our proof depends on two independent results.

### Theorem 1.2

(Regularity) Any measurable solution of (1.1) is infinitely differentiable on the interval (0, 1).

### Theorem 1.3

(Symmetry) Any solution of (1.1) satisfies $$u(x) = u(1-x)$$ for all $$x\in {\mathbb {Q}}\cap [0,1]$$.

The first is proved analytically, by means of standard techniques in the field of functional equations (cf. [1, 5, 9]), and the second by a novel geometrical argument, relating the equation to the action of the modular group on the projective line.

Theorems 1.2 and 1.3 above imply that any measurable solution u of (1.1) must be symmetric, i.e. $$u(x) = u(1-x)$$ for all $$x\in [0,1]$$, and therefore

\begin{aligned} u(x) + (1-x)^\alpha u\left( \frac{y}{1-x}\right) = u(y) + (1-y)^\alpha u\left( \frac{x}{1-y}\right) \end{aligned}
(1.12)

whenever $$x,y\in [0,1)$$ and $$x+y \in [0,1]$$. When $$\alpha =1$$, this equation is called “the fundamental equation of information theory”; it first appeared in the work of Tverberg [9], who deduced it from a characterization of an “information function” that not only supposed a version of the chain rule, but also the invariance of the function under permutations of its arguments. Daróczy introduced the fundamental equation for general $$\alpha >0$$, and showed that it can be deduced from an axiomatic characterization analogue to that of Tverberg, that again supposed invariance under permutations along with a deformed chain rule akin to (1.8), see [3, Thm. 5].

For $$\alpha = 1$$, Tverberg [9] showed that, if $$u:[0,1]\rightarrow {\mathbb {R}}$$ is symmetric, Lebesgue integrable and satisfies (1.12), then it must be a multiple of $$s_1(x)$$. In [5], Kannappan and Ng weakened the regularity condition, showing that all measurable solutions of (1.12) have the form $$u(x) = As_1(x) + Bx$$ (where A and B are arbitrary real constants), which reduces to $$u(x) = As_1(x)$$ when u is symmetric. In fact, they solved some generalizations of the fundamental equation, proving among other things that, when $$\alpha =1$$, the only measurable solutions of (1.1) are multiples of $$s_1(x)$$.

For $$\alpha \ne 1$$, Daróczy [3] established that any $$u:[0,1]\rightarrow {\mathbb {R}}$$ that satisfies (1.12) and $$u(0)=u(1)$$ has the formFootnote 2

\begin{aligned} u(x) = \frac{u(1/2)}{2^{1-\alpha }-1} (x^\alpha + (1-x)^\alpha - 1), \end{aligned}
(1.13)

without any hypotheses on the regularity of u. The proof starts by proving that any solution of (1.12) must satisfy $$u(0)=0$$ (setting $$x=0$$), and hence be symmetric (setting $$y=1-x$$). Since we are able to prove symmetry of the solutions of (1.1) restricted to rational arguments without any regularity hypothesis, we also get the following result.

### Corollary 1.4

For any $$\alpha \in (0,\infty )\setminus \{1\}$$, the only functions $$u:{\mathbb {Q}}\cap [0,1]\rightarrow {\mathbb {R}}$$ that satisfy equation (1.1) are multiples of $$s_\alpha$$.

### Proof

Set $$x=0$$ in (1.1) to conclude that $$u(1)=0$$, and $$y=0$$ to obtain $$u(0)=0$$. Moreover, u must be symmetric (Theorem 1.3), hence it must fulfill (1.12) when the arguments are rational. Given these facts, Daróczy’s proof in [3, p. 39] applies with no modifications when restricted to $$p,q\in {\mathbb {Q}}$$. $$\square$$

More details on the characterization of information functions by means of functional equations can be found in the classical reference [1], which gives a detailed historical introduction. Reference [2] summarizes more recent developments in connection with homological algebra.

It is quite remarkable that Theorem 1.1 serves as a fundamental result to prove that, up to a multiplicative constant, $$\{S_\alpha [X]\}_{X\in {\mathcal {S}}}$$ is the only collection of measurable functionals (not necessarily invariant under permutations) that satisfy the corresponding $$\alpha$$-chain rule, for any generic set of random variables $${\mathcal {S}}$$. In order to do this, one introduces an adapted cohomology theory, called information cohomology [2], where the chain rule corresponds to the 1-cocycle condition and thus has an algebro-topological meaning. The details can be found in the dissertation [10].

## The modular group

The group $$G= SL_2({\mathbb {Z}})/\{\pm I\}$$ is called the modular group; it is the image of $$SL_2({\mathbb {Z}})$$ in $$PGL_2({\mathbb {R}})$$. We keep using the matrix notation for the images in this quotient. We make G act on $$P^1({\mathbb {R}})$$ as follows: an element

\begin{aligned} g=\begin{pmatrix} a &{} b \\ c &{} d \end{pmatrix}\in G \end{aligned}

acting on $$[x:y]\in P^1({\mathbb {R}})$$ (homogeneous coordinates) gives

\begin{aligned} g[x:y] = [ax + by:cx+dy]. \end{aligned}

Let S and T be the elements of G defined by the matrices

\begin{aligned} S = \begin{pmatrix} 0 &{} -1 \\ 1 &{} 0 \end{pmatrix}\quad \text { and }\quad T = \begin{pmatrix} 1 &{} 1 \\ 0 &{} 1 \end{pmatrix}. \end{aligned}
(2.1)

The group G is generated by S and T [6, Ch. VII, Th. 2]; in fact, one can prove that $$\langle S,T;S^2, (ST)^3\rangle$$ is a presentation of G.

## Regularity: proof of Theorem 1.2

Lemma 3 in [5] implies that u is locally bounded on (0, 1) and hence locally integrable. Their proof is for $$\alpha =1$$, but the argument applies to the general case with almost no modification, just replacing

\begin{aligned} |u(y)| = \left| u(1-x) + (1-x) u \left( \frac{y}{1-x}\right) - (1-y) u\left( \frac{1-x-y}{1-y}\right) \right| \le 3N, \end{aligned}

where xy are such that $$u(1-x)\le N$$, $$u \left( \frac{y}{1-x}\right) \le N$$ and $$u\left( \frac{1-x-y}{1-y}\right) \le N$$, by

\begin{aligned} |u(y)| = \left| u(1-x) + (1-x)^\alpha u \left( \frac{y}{1-x}\right) - (1-y)^\alpha u\left( \frac{1-x-y}{1-y}\right) \right| \le 3N, \end{aligned}

which is evidently valid too whenever $$x,y\in (0,1)$$.

To prove the differentiability, we also follow the method in [5]—already present in [9]. Let us fix an arbitrary $$y_0\in (0,1)$$; then, it is possible to chose $$s,t\in (0,1)$$, $$s<t$$, such that

\begin{aligned} \frac{1-y-s}{1-y}, \frac{1-y-t}{1-y}\in (0,1), \end{aligned}

for all y in certain neighborhood of $$y_0$$. We integrate (1.1) with respect to x, between s and t, to obtain

\begin{aligned} (s-t)u(y) = \int _{1-t}^{1-s} u(x) \mathrm {d}x + y^{1+\alpha }\int _{\frac{y}{1-s}}^{\frac{y}{1-t}} \frac{u(z)}{z^3} \mathrm {d}z + (1-y)^{1+\alpha } \int _{\frac{1-y-s}{1-y}}^{\frac{1-y-t}{1-y}} u(z) \mathrm {d}z. \end{aligned}
(3.1)

The continuity of the right-hand side of (3.1) as a function of y at $$y_0$$, implies that u is continuous at $$y_0$$ and therefore on (0, 1). The continuity of u in the right-hand side of (3.1) implies that u is differentiable at $$y_0$$. An iterated application of this argument shows that u is infinitely differentiable on (0, 1).

## Symmetry: proof of Theorem 1.3

Define the function $$h:[0,1]\rightarrow {\mathbb {R}}$$ through

\begin{aligned} \forall x \in [0,1], \quad h(x) = u(x)-u(1-x). \end{aligned}
(4.1)

Observe that h is anti-symmetric around 1/2, that is, we have

\begin{aligned} \forall x\in [0,1], \quad h(x) = -h(1-x). \end{aligned}
(4.2)

Let now $$z\in \left[ \frac{1}{2}, 1\right]$$ be arbitrary and use the substitutions $$x=1-z$$ and $$y=1-z$$ in (1.1) to derive the identity

\begin{aligned} \forall z\in \left[ \frac{1}{2}, 1\right] , \quad h(z) = z^\alpha h(2-z^{-1}). \end{aligned}
(4.3)

Using the anti-symmetry of h to modify the right-hand side of the previous equation, we also deduce that

\begin{aligned} \forall z \in \left[ \frac{1}{2}, 1\right] , \quad h(z) = - z^\alpha h (z^{-1}-1). \end{aligned}
(4.4)

Setting $$x=0$$ (respectively $$y=0$$) in (1.1), we to conclude that $$u(1)=0$$ (resp. $$u(0)=0$$). Hence, the function h is subject to the boundary conditions $$h(0)=h(1)=0$$. From (4.3), it follows that $$h(1/2)=h(0)/2^\alpha = 0$$. If the domain of h is extended to the whole real line imposing 1-periodicity:

\begin{aligned} \forall x \in ]-\infty , \infty [, \quad h(x+1) = h(x), \end{aligned}
(4.5)

a similar argument can be used to determine the value of h at any rational argument. To that end, it is important to establish first that (4.3) and (4.4) hold for the extended function.

### Theorem 4.1

The function h, extended periodically to $${\mathbb {R}}$$, satisfies the equations

\begin{aligned} \forall x \in {\mathbb {R}}, \quad&h(x) = |x|^\alpha h \left( \frac{2x-1}{x}\right) , \end{aligned}
(4.6)
\begin{aligned} \forall x \in {\mathbb {R}}, \quad&h(x) = -|x|^\alpha h \left( \frac{1-x}{x}\right) . \end{aligned}
(4.7)

We establish first the anti-symmetry around 1/2 of the extended h (Lemma 4.2), which implies that (4.7) follows from (4.6); the latter is a consequence of Lemmas 4.34.7.

### Lemma 4.2

\begin{aligned} \forall x \in {\mathbb {R}}, \quad h(x) = -h(1-x). \end{aligned}

### Proof

We write $$x = [x]+\{x\}$$, where $$\{x\}:= x-[x]$$. Then,

\begin{aligned} h(x) \overset{{{(4.5)}}}{=} h(\{x\}) \overset{{{(4.2)}}}{=} - h(1-\{x\}) \overset{{{(4.5)}}}{=} -h(1-\{x\}-[x]) = - h(1-x). \end{aligned}

$$\square$$

### Lemma 4.3

\begin{aligned} \forall x\in [1,2], \quad h(x) = x^\alpha h (2-x^{-1}). \end{aligned}
(4.8)

### Proof

For h is periodic, (4.8) is equivalent to

\begin{aligned} \forall x\in [1,2], \quad h(x-1) = x^\alpha h(1-x^{-1}), \end{aligned}
(4.9)

and the change of variables $$u=x-1$$ gives

\begin{aligned} \forall u \in [0,1], \quad h(u) = (u+1)^\alpha h\left( \frac{u}{u+1}\right) . \end{aligned}
(4.10)

Note that $$1 - \frac{u}{u+1} = \frac{1}{u+1} \in [1/2,1]$$ whenever $$u\in [0,1]$$. Therefore,

\begin{aligned} h\left( \frac{u}{u+1}\right) \overset{{{(\text {Lemma }4.2)}}}{=} -h\left( \frac{1}{u+1}\right) \overset{{{(4.4)}}}{=} \left( \frac{1}{u+1}\right) ^\alpha h(u). \end{aligned}

This establishes (4.10). $$\square$$

### Lemma 4.4

\begin{aligned} \forall x \in [2,\infty [, \quad h(x) = x^\alpha h(2-x^{-1}). \end{aligned}
(4.11)

### Proof

If $$x\in [2,\infty [$$, then $$1 - \frac{1}{x} \in \left[ \frac{1}{2}, 1\right]$$ and we can apply Eq. (4.3) to obtain

\begin{aligned} h\left( 1 - \frac{1}{x}\right) \overset{{{(4.3)}}}{=} \left( 1 - \frac{1}{x}\right) ^\alpha h \left( 2-\left( 1 - \frac{1}{x}\right) ^{-1}\right) = \left( \frac{x-1}{x}\right) ^\alpha h\left( 1- \frac{1}{x-1}\right) . \end{aligned}
(4.12)

We prove (4.11) by recurrence. The case $$x\in [1,2]$$ corresponds to Lemma 4.3. Suppose it is valid on $$[n-1,n]$$, for certain $$n\ge 2$$; for $$x\in [n,n+1]$$,

\begin{aligned} h(x)&\overset{{{(4.5)}}}{=} h(x-1) \\&\overset{{{(\text {rec.})}}}{=} (x-1)^\alpha h(2-(x-1)^{-1}) \\&\overset{{{(4.5)}}}{=} (x-1)^\alpha h(1-(x-1)^{-1}) \\&\overset{{{(4.12)}}}{=} x^\alpha h(1-x^{-1})\\&\overset{{{(4.5)}}}{=} x^\alpha h(1-x^{-1}). \end{aligned}

$$\square$$

### Lemma 4.5

\begin{aligned} \forall x \in \left[ 0,\frac{1}{2}\right] , \quad h(x) = -x^\alpha h(x^{-1}-1). \end{aligned}
(4.13)

### Proof

The previous lemma and periodicity imply that $$h(x-1) = x^\alpha h(1-x^{-1})$$ for all $$x\ge 2$$, i.e.

\begin{aligned} \forall u \ge 1, \quad h(u) = (u+1)^\alpha h\left( 1-\frac{1}{u+1}\right) . \end{aligned}
(4.14)

Then, for $$u\ge 1$$,

\begin{aligned} h\left( \frac{1}{u+1}\right) \overset{{{(\text {Lem. }4.2)}}}{=} -h\left( 1-\frac{1}{u+1}\right) \overset{{{(4.14)}}}{=} - \left( \frac{1}{u+1}\right) ^\alpha h(u). \end{aligned}
(4.15)

We set $$y=(u+1)^{-1}\in \left( 0,\frac{1}{2}\right]$$. Equation (4.15) reads

\begin{aligned} \forall y \in \left( 0,\frac{1}{2}\right] , \quad h(y) = -y^\alpha h(y^{-1}-1). \end{aligned}
(4.16)

Since $$h(0) = 0$$, the lemma is proved. $$\square$$

### Lemma 4.6

\begin{aligned} \forall x \in \left[ 0,\frac{1}{2}\right] , \quad h(x) = x^\alpha h(2-x^{-1}). \end{aligned}
(4.17)

### Proof

Immediately deduced from the previous lemma using the anti-symmetric property in Lemma 4.2. $$\square$$

### Lemma 4.7

\begin{aligned} \forall x \in ]-\infty ,0], \quad h(x) = -x^\alpha h(2-x^{-1}). \end{aligned}

### Proof

On the one hand, periodicity implies that $$h(x) = h(x+1) \overset{{{(\text {Lem. }4.2)}}}{=} -h(1-(x+1)) = -h(-x)$$. On the other, for $$x\le 0$$, the preceding results imply that $$h(-x) = (-x)^\alpha h(2-(-x)^{-1})= |x|^\alpha h(2-(-x)^{-1})$$. Therefore,

\begin{aligned} h(x)&= -h(-x) = -|x|^\alpha h \left( 2+ \frac{1}{x}\right) \\&\overset{{{(\text {Lem. }4.2)}}}{=} |x|^\alpha h\left( 1-\left( 2+ \frac{1}{x}\right) \right) \overset{{{(4.5)}}}{=} |x|^\alpha h \left( 2- \frac{1}{x}\right) . \end{aligned}

$$\square$$

The transformations $$x\mapsto \frac{2x-1}{x}$$ and $$x\mapsto \frac{1-x}{x}$$ in Eqs. (4.6) and (4.7) are homographies of the real projective line $$P^1({\mathbb {R}})$$, that we denote respectively by $$\alpha$$ and $$\beta$$. They correspond to elements

\begin{aligned} A= \begin{pmatrix} 2 &{} -1 \\ 1 &{} 0 \end{pmatrix}, \quad B = \begin{pmatrix} -1 &{} 1 \\ 1 &{} 0 \end{pmatrix} \end{aligned}
(4.18)

in G, that satisfy

\begin{aligned} B^2= \begin{pmatrix} 2 &{} -1 \\ -1 &{} 1 \end{pmatrix}, \quad BA^{-1} = \begin{pmatrix} -1 &{} 1 \\ 0 &{} 1 \end{pmatrix}. \end{aligned}
(4.19)

This last matrix corresponds to $$x\mapsto 1-x$$.

### Lemma 4.8

The matrices A and $$B^2$$ generate G.

### Proof

Let

\begin{aligned} P= S^{-1}T^{-1}=\begin{pmatrix} 0 &{} 1 \\ -1 &{} 1 \end{pmatrix}. \end{aligned}

One has

\begin{aligned} P A P^{-1} = \begin{pmatrix} 1 &{} -1 \\ 0 &{} 1 \end{pmatrix}, \end{aligned}
(4.20)

and

\begin{aligned} P B^{2} P^{-1} = \begin{pmatrix} 3 &{} -1 \\ 1 &{} 0 \end{pmatrix} . \end{aligned}
(4.21)

Therefore, $$PAP^{-1} = T^{-1}$$ and $$S=T^{-3} P B^{-2} P^{-1}$$. Inverting these relations, we obtain

\begin{aligned} T = PA^{-1}P^{-1}; \quad S=PA^3B^{-2}P^{-1}. \end{aligned}
(4.22)

Let X be an arbitrary element of G. Since $$Y=PXP^{-1}\in G$$ and G is generated by S and T, the element Y is a word in S and T. In consequence, X is a word in $$P^{-1}SP$$ and $$P^{-1}TP$$, which in turn are words A and $$B^2$$. $$\square$$

It is possible to find explicit formulas for S and T in terms of A and $$B^2$$. Since $$P=S^{-1}T^{-1}$$, we deduce that $$PSP^{-1}= S^{-1}T^{-1}STS$$ and $$PTP^{-1} = S^{-1}T^{-1}TTS = S^{-1}TS$$. Hence, in virtue of (4.22),

\begin{aligned} S&= P^{-1}S^{-1}T^{-1}STS P\\&=(P^{-1} S^{-1}P)(P^{-1} T^{-1} P)(P^{-1} S P)(P^{-1} T P) (P^{-1} SP)\\&= B^2 A B^{-2}A^2 B^{-2} \end{aligned}

and

\begin{aligned} T&= P^{-1} S^{-1} T S P \\&= (P^{-1} S^{-1} P)(P^{-1} T P) (P^{-1} S P)\\&= B^2 A^{-1} B^{-2}. \end{aligned}

To finish our proof of Proposition 1.3, we remark that the orbit of 0 by the action of G on $$P^1({\mathbb {R}})$$ is $${\mathbb {Q}}\cup \{\infty \}$$, where $${\mathbb {Q}}\cup \{\infty \}$$ has been identified with $$\{[p:q] \in P^1({\mathbb {R}}) \mid p,q\in {\mathbb {Z}}\}\subset P^1({\mathbb {R}})$$. This is a consequence of Bezout’s identity: for every point $$[p:q]\in P^1({\mathbb {R}})$$ representing a reduced fraction $$\frac{p}{q} \ne 0$$ ($$p,q \in {\mathbb {Z}}\setminus \{0\}$$ and coprime), there are two integers xy such that $$xq - yp = 1$$. Therefore

\begin{aligned}g'=\begin{pmatrix} x &{} p \\ y &{} q \end{pmatrix} \end{aligned}

is an element of G and $$g'[0:1] = [p:q]$$. The case $$q=0$$ is covered by

\begin{aligned}\begin{pmatrix} 0 &{} 1 \\ -1 &{} 0 \end{pmatrix} [0:1] = [1:0]. \end{aligned}

The extended Eqs. (4.6) and (4.7) are such that

1. 1.

For all $$x\in {\mathbb {R}}$$, if $$h(x) = 0$$ then $$h(\alpha ^{-1} x) = 0$$ and $$h(\beta ^{-1} x) = 0$$;

2. 2.

For all $$x\in {\mathbb {R}}\setminus \{0\}$$, if $$h(x) = 0$$ then $$h(\alpha x) = 0$$ and $$h(\beta x) = 0$$.

Since $$h(1/2)=0$$, the following lemma is the missing piece to establish that the extended h vanishes on $${\mathbb {Q}}$$ (and hence the original h necessarily vanishes on $$[0,1]\cap {\mathbb {Q}})$$.

### Lemma 4.9

For any $$r\in {\mathbb {Q}}\setminus \{0\}$$, there exists a finite sequence

\begin{aligned} w=(w_i)_{i=1}^n\in \{\alpha ,\beta ,\alpha ^{-1},\beta ^{-1}\}^n \end{aligned}

such that $$r=w_n\circ \cdots \circ w_1(1/2)$$ and, for all $$i\in \{1,...,n\}$$, the iterate $$x_i:=w_i\circ \cdots \circ w_1(1/2)$$ does not equal 0 or $$\infty$$.

### Proof

Since the orbit in $$P^1({\mathbb {R}})$$ of 1/2 by the group of homographies generated by A and $$B^2$$ (i.e. G itself) contains the whole set of rational numbers $${\mathbb {Q}}$$, there exists a w such that $$r=w_n \circ \cdots \circ w_1(1/2)$$, where each $$w_i$$ equals $$\alpha$$, $$\beta$$ or one of their inverses.

If some iterate equals 0 or $$\infty$$, the sequence w can be modified to avoid this. Let $$i\in \{0,...,n\}$$ be the largest index such that $$x_i\in \{0,\infty \}$$; in fact, $$i<n$$ because $$r\ne 0,\infty$$.

• If $$x_i = 0$$, then $$x_{i+1} \in \{1/2,1\}$$ (the possibility $$x_{i+1}=\infty$$ is ruled out by the choice of i). In the case $$x_{i+1}=1/2$$, the equality $$r=w_n\circ \cdots \circ w_{i+2}(1/2)$$ holds, and when $$x_{i+1}=1$$, we have $$r=w_n\circ \cdots \circ w_{i+2}\circ \beta (1/2)$$.

• If $$x_i = \infty$$, then $$x_{i+1}\in \{2,-1\}$$ (again, $$x_{i+1}=0$$ is ruled out). When $$x_{i+1}=2$$, we have $$r=w_n\circ \cdots \circ w_{i+2} \circ \beta \circ \alpha \circ \beta ^{-1}\circ \beta ^{-1}(1/2)$$, and when $$x_{i+1}=-1$$, it also holds that $$r=w_n\circ \cdots \circ w_{i+2} \circ \alpha \circ \alpha \circ \beta ^{-1}\circ \beta ^{-1}(1/2)$$.

$$\square$$

## Notes

1. 1.

Assumption A can be deduced from B if one identifies X with (XX) through the diagonal map $$E_X \rightarrow E_X\times E_X, \;x\mapsto (x,x)$$ and then evaluates (1.9) at $$Y=X$$ and $$p=\delta _{x_0}$$, for any $$x_0\in E_X$$.

2. 2.

In fact, he supposes $$u(1/2)=1$$, but the argument works in general.

## References

1. 1.

Aczél, J., Daróczy, Z.: On Measures of Information and Their Characterizations. Mathematics in Science and Engineering. Academic Press, New York (1975)

2. 2.

Baudot, P., Bennequin, D.: The homological nature of entropy. Entropy 17(5), 3253–3318 (2015)

3. 3.

Daróczy, Z.: Generalized information functions. Inf. Control 16(1), 36–51 (1970)

4. 4.

Havrda, J., Charvát, F.: Quantification method of classification processes. Concept of structural $$a$$-entropy. Kybernetika 3(1), 30–35 (1967)

5. 5.

Kannappan, P., Ng, C.T.: Measurable solutions of functional equations related to information theory. Proc. Am. Math. Soc. 38(2), 303–310 (1973)

6. 6.

Serre, J.: A Course in Arithmetic. Graduate Texts in Mathematics. Springer, Berlin (1973)

7. 7.

Shannon, C.: A mathematical theory of communication. Bell Syst. Tech. J. 27(379–423), 623–656 (1948)

8. 8.

Tsallis, C.: Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World. Springer, New York (2009)

9. 9.

Tverberg, H.: A new derivation of the information function. Math. Scand. 6, 297–298 (1958)

10. 10.

Vigneaux, J.P.: Topology of Statistical Systems: A Cohomological Approach to Information Theory. Ph.D. Thesis, Université Paris Diderot (2019)

## Acknowledgements

Open access funding provided by Projekt DEAL.

## Author information

Authors

### Corresponding author

Correspondence to Juan Pablo Vigneaux.

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article was written while the second author was a graduate student at the Université Paris Diderot.

## Rights and permissions

Reprints and Permissions

Bennequin, D., Vigneaux, J.P. A functional equation related to generalized entropies and the modular group. Aequat. Math. (2020). https://doi.org/10.1007/s00010-020-00717-2

• Revised:

• Published:

### Keywords

• Generalized entropies
• Shannon entropy
• Tsallis entropy
• Modular group
• Functional equation
• Information cohomology

### Mathematics Subject Classification

• Primary 97I70
• 94A17