A functional equation related to generalized entropies and the modular group

Bennequin, Daniel; Vigneaux, Juan Pablo

doi:10.1007/s00010-020-00717-2

A functional equation related to generalized entropies and the modular group

Open access
Published: 02 March 2020

Volume 94, pages 1201–1212, (2020)
Cite this article

Download PDF

You have full access to this open access article

Aequationes mathematicae Aims and scope Submit manuscript

A functional equation related to generalized entropies and the modular group

Download PDF

1281 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

We solve a functional equation connected to the algebraic characterization of generalized information functions. To prove the symmetry of the solution, we study a related system of functional equations, which involves two homographies. These transformations generate the modular group, and this fact plays a crucial role in solving the system. The method suggests a more general relation between conditional probabilities and arithmetic.

Convexity Properties of Some Entropies (II)

Article 02 August 2019

Forward and Reverse Entropy Power Inequalities in Convex Geometry

Gamma Function and Its Functional Equations

Article 18 March 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Motivation and results

In this paper, we study the measurable solutions $u:[0,1]\rightarrow {\mathbb {R}}$ of the functional equation

$$\begin{aligned} u(1-x)+(1-x)^\alpha u\left( \frac{y}{1-x}\right) = u(y) + (1-y)^\alpha u\left( \frac{1-x-y}{1-y}\right) \end{aligned}$$

(1.1)

for all $x,y \in [0,1)$ such that $x+y\in [0,1]$. The parameter $\alpha $ can take any positive real value.

This equation appears in the context of algebraic characterizations of information functions. Given a random variable X whose range is a finite set $E_X$, a measure of its “information content” is supposed to be a function $f[X]: \Delta (E_X) \rightarrow {\mathbb {R}}$, where $\Delta (E_X)$ denotes the set of probabilities on $E_X$,

$$\begin{aligned} \Delta (E_X) =\left\{ \,p:E_X\rightarrow [0,1] \, \big \vert \, \sum _{x\in E_X}p(x) = 1\,\right\} . \end{aligned}$$

(1.2)

The most important example of such a function is the Shannon-Gibbs entropy

$$\begin{aligned} S_1[X](p) := -\sum _{x\in E_X} p(x)\log p(x), \end{aligned}$$

(1.3)

where $0\log 0$ equals 0 by convention.

Shannon entropy satisfies a remarkable property, called the chain rule, that we now describe. Let X (resp. Y) be a variable with range $E_X$ (resp. $E_Y$); both $E_X$ and $E_Y$ are supposed to be finite sets. The couple (X, Y) takes values in a subset $E_{XY}$ of $E_X\times E_Y$, and any probability p on $E_{XY}$ induce by marginalization laws $X_*p$ on $E_X$ and $Y_*p$ on $E_Y$. For instance,

$$\begin{aligned} X_*p(x) = \sum _{y: (x,y)\in E_{XY}} p(x,y). \end{aligned}$$

(1.4)

The chain rule corresponds to the identities

$$\begin{aligned} S_1[(X,Y)](p)&= S_1[X](X_*p) + \sum _{x\in E_X} X_*p(x) S_1[Y](Y_*(p|_{X=x})), \end{aligned}$$

(1.5)

$$\begin{aligned} S_1[(X,Y)](p)&= S_1[Y](Y_*p) + \sum _{y\in E_Y} Y_*p(y) S_1[X](Y_*(p|_{Y=y})), \end{aligned}$$

(1.6)

where $p|_{X=x}$ denotes the conditional probability $y\mapsto p(y,x)/X_*p(x)$. These identities reflect the third axiom used by Shannon to characterize an information measure H: “if a choice be broken down into two successive choices, the original H should be the weighted sum of the individual values of H” [7].

There is a deformed version of Shannon entropy, called generalized entropy of degree $\alpha $ [1, Ch. 6]. For any $\alpha \in (0,\infty )\setminus \{1\}$, it is defined as

$$\begin{aligned} S_\alpha [X](p) := \frac{1}{1-\alpha }\left( \sum _{x\in E_X} p(x)^\alpha -1\right) . \end{aligned}$$

(1.7)

This function was introduced by Havrda and Charvát [4]. Constantino Tsallis popularized its use in physics, as the fundamental quantity of non-extensive statistical mechanics [8], so $S_\alpha $ is also called Tsallis $\alpha $-entropy. It satisfies a deformed version of the chain rule:

$$\begin{aligned} S_\alpha [(X,Y)](p) = S_\alpha [X](X_*p) + \sum _{x\in X} (X_*p(x))^\alpha S_\alpha [Y](Y_*(p|_{X=x})). \end{aligned}$$

(1.8)

Suppose now that, given $\alpha >0$, we want to find the most general functions f[X]—for a given collection of finite random variables X—such that

A.
$f[X](\delta )=0$ whenever $\delta $ is any Dirac measure—a measure concentrated on a singleton—, which means that variables with deterministic outputs do not give (new) information when measured;
B.
the generalized $\alpha $-chain rule holds, i.e. for any variables X and Y with finite range^{Footnote 1}
$$\begin{aligned} f[(X,Y)](p)&= f[X](X_*p) + \sum _{x\in E_X} (X_*p(x))^\alpha f[Y](Y_*(p|_{X=x})), \end{aligned}$$
(1.9)
$$\begin{aligned} f[(X,Y)](p)&= f[Y](Y_*p) + \sum _{y\in E_Y} (Y_*p(y))^\alpha f[X](Y_*(p|_{Y=y})). \end{aligned}$$
(1.10)

The simplest non-trivial case corresponds to $E_X=E_Y=\{0,1\}$ and $E_{XY}=\{(0,0),(1,0),(0,1)\}$; a probability p on $E_{XY}$ is a triple $p(0,0)=a$, $p(1,0)=b$, $p(0,1)=c$, such that $X_*p=(a+c,b)$ and $Y_*p=(a+b,c)$. The equality between the right-hand sides of (1.9) and (1.10) reads

$$\begin{aligned}&f[X](a+c,b) + (1-b)^\alpha f[Y]\left( \frac{a}{1-b},\frac{c}{1-b}\right) \nonumber \\&\quad =f[Y](a+b,c)+(1-c)^\alpha f[X]\left( \frac{a}{1-c},\frac{b}{1-c}\right) , \end{aligned}$$

(1.11)

for any triple $(a,b,c)\in [0,1]^2$ such that $a+b+c=1$. Setting $a=0$ and using assumption A, we conclude that $f[X](c,1-c)=f[Y](1-c,c)=: u(c)$ for any $c\in [0,1]$. Therefore, (1.11) can be written in terms of this unique unknown u; if moreover we set $c=y$, $b=x$ and consequently $a=1-x-y$, we get the functional equation (1.1), with the stated boundary conditions.

The main result of this article is the following.

Theorem 1.1

Let $\alpha $ be a positive real number. Suppose $u:[0,1]\rightarrow {\mathbb {R}}$ is a measurable function that satisfies (1.1) for every $x,y \in [0,1)$ such that $x+y\in [0,1]$. Then, there exists $\lambda \in {\mathbb {R}}$ such that $u(x)=\lambda s_\alpha (x)$, where

$$\begin{aligned} s_1(x) = -x \log _2 x -(1-x)\log _2 (1-x) \end{aligned}$$

and

$$\begin{aligned} s_\alpha (x) =\frac{1}{1-\alpha }(x^\alpha + (1-x)^\alpha -1) \end{aligned}$$

when $\alpha \ne 1$.

By convention, $0\log _2 0 := \lim _{x\rightarrow 0} x \log _2 x = 0$. For $\alpha =1$, Theorem 1.1 is essentially Lemma 2 in [5]. Our proof depends on two independent results.

Theorem 1.2

(Regularity) Any measurable solution of (1.1) is infinitely differentiable on the interval (0, 1).

Theorem 1.3

(Symmetry) Any solution of (1.1) satisfies $u(x) = u(1-x)$ for all $x\in {\mathbb {Q}}\cap [0,1]$.

The first is proved analytically, by means of standard techniques in the field of functional equations (cf. [1, 5, 9]), and the second by a novel geometrical argument, relating the equation to the action of the modular group on the projective line.

Theorems 1.2 and 1.3 above imply that any measurable solution u of (1.1) must be symmetric, i.e. $u(x) = u(1-x)$ for all $x\in [0,1]$, and therefore

$$\begin{aligned} u(x) + (1-x)^\alpha u\left( \frac{y}{1-x}\right) = u(y) + (1-y)^\alpha u\left( \frac{x}{1-y}\right) \end{aligned}$$

(1.12)

whenever $x,y\in [0,1)$ and $x+y \in [0,1]$. When $\alpha =1$, this equation is called “the fundamental equation of information theory”; it first appeared in the work of Tverberg [9], who deduced it from a characterization of an “information function” that not only supposed a version of the chain rule, but also the invariance of the function under permutations of its arguments. Daróczy introduced the fundamental equation for general $\alpha >0$, and showed that it can be deduced from an axiomatic characterization analogue to that of Tverberg, that again supposed invariance under permutations along with a deformed chain rule akin to (1.8), see [3, Thm. 5].

For $\alpha = 1$, Tverberg [9] showed that, if $u:[0,1]\rightarrow {\mathbb {R}}$ is symmetric, Lebesgue integrable and satisfies (1.12), then it must be a multiple of $s_1(x)$. In [5], Kannappan and Ng weakened the regularity condition, showing that all measurable solutions of (1.12) have the form $u(x) = As_1(x) + Bx$ (where A and B are arbitrary real constants), which reduces to $u(x) = As_1(x)$ when u is symmetric. In fact, they solved some generalizations of the fundamental equation, proving among other things that, when $\alpha =1$, the only measurable solutions of (1.1) are multiples of $s_1(x)$.

For $\alpha \ne 1$, Daróczy [3] established that any $u:[0,1]\rightarrow {\mathbb {R}}$ that satisfies (1.12) and $u(0)=u(1)$ has the form^{Footnote 2}

$$\begin{aligned} u(x) = \frac{u(1/2)}{2^{1-\alpha }-1} (x^\alpha + (1-x)^\alpha - 1), \end{aligned}$$

(1.13)

without any hypotheses on the regularity of u. The proof starts by proving that any solution of (1.12) must satisfy $u(0)=0$ (setting $x=0$), and hence be symmetric (setting $y=1-x$). Since we are able to prove symmetry of the solutions of (1.1) restricted to rational arguments without any regularity hypothesis, we also get the following result.

Corollary 1.4

For any $\alpha \in (0,\infty )\setminus \{1\}$, the only functions $u:{\mathbb {Q}}\cap [0,1]\rightarrow {\mathbb {R}}$ that satisfy equation (1.1) are multiples of $s_\alpha $.

Proof

Set $x=0$ in (1.1) to conclude that $u(1)=0$, and $y=0$ to obtain $u(0)=0$. Moreover, u must be symmetric (Theorem 1.3), hence it must fulfill (1.12) when the arguments are rational. Given these facts, Daróczy’s proof in [3, p. 39] applies with no modifications when restricted to $p,q\in {\mathbb {Q}}$. $\square $

More details on the characterization of information functions by means of functional equations can be found in the classical reference [1], which gives a detailed historical introduction. Reference [2] summarizes more recent developments in connection with homological algebra.

It is quite remarkable that Theorem 1.1 serves as a fundamental result to prove that, up to a multiplicative constant, $\{S_\alpha [X]\}_{X\in {\mathcal {S}}}$ is the only collection of measurable functionals (not necessarily invariant under permutations) that satisfy the corresponding $\alpha $-chain rule, for any generic set of random variables ${\mathcal {S}}$. In order to do this, one introduces an adapted cohomology theory, called information cohomology [2], where the chain rule corresponds to the 1-cocycle condition and thus has an algebro-topological meaning. The details can be found in the dissertation [10].

2 The modular group

The group $G= SL_2({\mathbb {Z}})/\{\pm I\}$ is called the modular group; it is the image of $SL_2({\mathbb {Z}})$ in $PGL_2({\mathbb {R}})$. We keep using the matrix notation for the images in this quotient. We make G act on $P^1({\mathbb {R}})$ as follows: an element

$$\begin{aligned} g=\begin{pmatrix} a &{} b \\ c &{} d \end{pmatrix}\in G \end{aligned}$$

acting on $[x:y]\in P^1({\mathbb {R}})$ (homogeneous coordinates) gives

$$\begin{aligned} g[x:y] = [ax + by:cx+dy]. \end{aligned}$$

Let S and T be the elements of G defined by the matrices

$$\begin{aligned} S = \begin{pmatrix} 0 &{} -1 \\ 1 &{} 0 \end{pmatrix}\quad \text { and }\quad T = \begin{pmatrix} 1 &{} 1 \\ 0 &{} 1 \end{pmatrix}. \end{aligned}$$

(2.1)

The group G is generated by S and T [6, Ch. VII, Th. 2]; in fact, one can prove that $\langle S,T;S^2, (ST)^3\rangle $ is a presentation of G.

3 Regularity: proof of Theorem 1.2

Lemma 3 in [5] implies that u is locally bounded on (0, 1) and hence locally integrable. Their proof is for $\alpha =1$, but the argument applies to the general case with almost no modification, just replacing

$$\begin{aligned} |u(y)| = \left| u(1-x) + (1-x) u \left( \frac{y}{1-x}\right) - (1-y) u\left( \frac{1-x-y}{1-y}\right) \right| \le 3N, \end{aligned}$$

where x, y are such that $u(1-x)\le N$, $u \left( \frac{y}{1-x}\right) \le N$ and $u\left( \frac{1-x-y}{1-y}\right) \le N$, by

$$\begin{aligned} |u(y)| = \left| u(1-x) + (1-x)^\alpha u \left( \frac{y}{1-x}\right) - (1-y)^\alpha u\left( \frac{1-x-y}{1-y}\right) \right| \le 3N, \end{aligned}$$

which is evidently valid too whenever $x,y\in (0,1)$.

To prove the differentiability, we also follow the method in [5]—already present in [9]. Let us fix an arbitrary $y_0\in (0,1)$; then, it is possible to chose $s,t\in (0,1)$, $s<t$, such that

$$\begin{aligned} \frac{1-y-s}{1-y}, \frac{1-y-t}{1-y}\in (0,1), \end{aligned}$$

for all y in certain neighborhood of $y_0$. We integrate (1.1) with respect to x, between s and t, to obtain

$$\begin{aligned} (s-t)u(y) = \int _{1-t}^{1-s} u(x) \mathrm {d}x + y^{1+\alpha }\int _{\frac{y}{1-s}}^{\frac{y}{1-t}} \frac{u(z)}{z^3} \mathrm {d}z + (1-y)^{1+\alpha } \int _{\frac{1-y-s}{1-y}}^{\frac{1-y-t}{1-y}} u(z) \mathrm {d}z. \end{aligned}$$

(3.1)

The continuity of the right-hand side of (3.1) as a function of y at $y_0$, implies that u is continuous at $y_0$ and therefore on (0, 1). The continuity of u in the right-hand side of (3.1) implies that u is differentiable at $y_0$. An iterated application of this argument shows that u is infinitely differentiable on (0, 1).

4 Symmetry: proof of Theorem 1.3

Define the function $h:[0,1]\rightarrow {\mathbb {R}}$ through

$$\begin{aligned} \forall x \in [0,1], \quad h(x) = u(x)-u(1-x). \end{aligned}$$

(4.1)

Observe that h is anti-symmetric around 1/2, that is, we have

$$\begin{aligned} \forall x\in [0,1], \quad h(x) = -h(1-x). \end{aligned}$$

(4.2)

Let now $z\in \left[ \frac{1}{2}, 1\right] $ be arbitrary and use the substitutions $x=1-z$ and $y=1-z$ in (1.1) to derive the identity

$$\begin{aligned} \forall z\in \left[ \frac{1}{2}, 1\right] , \quad h(z) = z^\alpha h(2-z^{-1}). \end{aligned}$$

(4.3)

Using the anti-symmetry of h to modify the right-hand side of the previous equation, we also deduce that

$$\begin{aligned} \forall z \in \left[ \frac{1}{2}, 1\right] , \quad h(z) = - z^\alpha h (z^{-1}-1). \end{aligned}$$

(4.4)

Setting $x=0$ (respectively $y=0$) in (1.1), we to conclude that $u(1)=0$ (resp. $u(0)=0$). Hence, the function h is subject to the boundary conditions $h(0)=h(1)=0$. From (4.3), it follows that $h(1/2)=h(0)/2^\alpha = 0$. If the domain of h is extended to the whole real line imposing 1-periodicity:

$$\begin{aligned} \forall x \in ]-\infty , \infty [, \quad h(x+1) = h(x), \end{aligned}$$

(4.5)

a similar argument can be used to determine the value of h at any rational argument. To that end, it is important to establish first that (4.3) and (4.4) hold for the extended function.

Theorem 4.1

The function h, extended periodically to ${\mathbb {R}}$, satisfies the equations

$$\begin{aligned} \forall x \in {\mathbb {R}}, \quad&h(x) = |x|^\alpha h \left( \frac{2x-1}{x}\right) , \end{aligned}$$

(4.6)

$$\begin{aligned} \forall x \in {\mathbb {R}}, \quad&h(x) = -|x|^\alpha h \left( \frac{1-x}{x}\right) . \end{aligned}$$

(4.7)

We establish first the anti-symmetry around 1/2 of the extended h (Lemma 4.2), which implies that (4.7) follows from (4.6); the latter is a consequence of Lemmas 4.3–4.7.

Lemma 4.2

$$\begin{aligned} \forall x \in {\mathbb {R}}, \quad h(x) = -h(1-x). \end{aligned}$$

Proof

We write $x = [x]+\{x\}$, where $\{x\}:= x-[x]$. Then,

$$\begin{aligned} h(x) \overset{{{(4.5)}}}{=} h(\{x\}) \overset{{{(4.2)}}}{=} - h(1-\{x\}) \overset{{{(4.5)}}}{=} -h(1-\{x\}-[x]) = - h(1-x). \end{aligned}$$

$\square $

Lemma 4.3

$$\begin{aligned} \forall x\in [1,2], \quad h(x) = x^\alpha h (2-x^{-1}). \end{aligned}$$

(4.8)

Proof

For h is periodic, (4.8) is equivalent to

$$\begin{aligned} \forall x\in [1,2], \quad h(x-1) = x^\alpha h(1-x^{-1}), \end{aligned}$$

(4.9)

and the change of variables $u=x-1$ gives

$$\begin{aligned} \forall u \in [0,1], \quad h(u) = (u+1)^\alpha h\left( \frac{u}{u+1}\right) . \end{aligned}$$

(4.10)

Note that $1 - \frac{u}{u+1} = \frac{1}{u+1} \in [1/2,1]$ whenever $u\in [0,1]$. Therefore,

$$\begin{aligned} h\left( \frac{u}{u+1}\right) \overset{{{(\text {Lemma }4.2)}}}{=} -h\left( \frac{1}{u+1}\right) \overset{{{(4.4)}}}{=} \left( \frac{1}{u+1}\right) ^\alpha h(u). \end{aligned}$$

This establishes (4.10). $\square $

Lemma 4.4

$$\begin{aligned} \forall x \in [2,\infty [, \quad h(x) = x^\alpha h(2-x^{-1}). \end{aligned}$$

(4.11)

Proof

If $x\in [2,\infty [$, then $1 - \frac{1}{x} \in \left[ \frac{1}{2}, 1\right] $ and we can apply Eq. (4.3) to obtain

$$\begin{aligned} h\left( 1 - \frac{1}{x}\right) \overset{{{(4.3)}}}{=} \left( 1 - \frac{1}{x}\right) ^\alpha h \left( 2-\left( 1 - \frac{1}{x}\right) ^{-1}\right) = \left( \frac{x-1}{x}\right) ^\alpha h\left( 1- \frac{1}{x-1}\right) . \end{aligned}$$

(4.12)

We prove (4.11) by recurrence. The case $x\in [1,2]$ corresponds to Lemma 4.3. Suppose it is valid on $[n-1,n]$, for certain $n\ge 2$; for $x\in [n,n+1]$,

$$\begin{aligned} h(x)&\overset{{{(4.5)}}}{=} h(x-1) \\&\overset{{{(\text {rec.})}}}{=} (x-1)^\alpha h(2-(x-1)^{-1}) \\&\overset{{{(4.5)}}}{=} (x-1)^\alpha h(1-(x-1)^{-1}) \\&\overset{{{(4.12)}}}{=} x^\alpha h(1-x^{-1})\\&\overset{{{(4.5)}}}{=} x^\alpha h(1-x^{-1}). \end{aligned}$$

$\square $

Lemma 4.5

$$\begin{aligned} \forall x \in \left[ 0,\frac{1}{2}\right] , \quad h(x) = -x^\alpha h(x^{-1}-1). \end{aligned}$$

(4.13)

Proof

The previous lemma and periodicity imply that $h(x-1) = x^\alpha h(1-x^{-1})$ for all $x\ge 2$, i.e.

$$\begin{aligned} \forall u \ge 1, \quad h(u) = (u+1)^\alpha h\left( 1-\frac{1}{u+1}\right) . \end{aligned}$$

(4.14)

Then, for $u\ge 1$,

$$\begin{aligned} h\left( \frac{1}{u+1}\right) \overset{{{(\text {Lem. }4.2)}}}{=} -h\left( 1-\frac{1}{u+1}\right) \overset{{{(4.14)}}}{=} - \left( \frac{1}{u+1}\right) ^\alpha h(u). \end{aligned}$$

(4.15)

We set $y=(u+1)^{-1}\in \left( 0,\frac{1}{2}\right] $. Equation (4.15) reads

$$\begin{aligned} \forall y \in \left( 0,\frac{1}{2}\right] , \quad h(y) = -y^\alpha h(y^{-1}-1). \end{aligned}$$

(4.16)

Since $h(0) = 0$, the lemma is proved. $\square $

Lemma 4.6

$$\begin{aligned} \forall x \in \left[ 0,\frac{1}{2}\right] , \quad h(x) = x^\alpha h(2-x^{-1}). \end{aligned}$$

(4.17)

Proof

Immediately deduced from the previous lemma using the anti-symmetric property in Lemma 4.2. $\square $

Lemma 4.7

$$\begin{aligned} \forall x \in ]-\infty ,0], \quad h(x) = -x^\alpha h(2-x^{-1}). \end{aligned}$$

Proof

On the one hand, periodicity implies that $h(x) = h(x+1) \overset{{{(\text {Lem. }4.2)}}}{=} -h(1-(x+1)) = -h(-x)$. On the other, for $x\le 0$, the preceding results imply that $h(-x) = (-x)^\alpha h(2-(-x)^{-1})= |x|^\alpha h(2-(-x)^{-1})$. Therefore,

$$\begin{aligned} h(x)&= -h(-x) = -|x|^\alpha h \left( 2+ \frac{1}{x}\right) \\&\overset{{{(\text {Lem. }4.2)}}}{=} |x|^\alpha h\left( 1-\left( 2+ \frac{1}{x}\right) \right) \overset{{{(4.5)}}}{=} |x|^\alpha h \left( 2- \frac{1}{x}\right) . \end{aligned}$$

$\square $

The transformations $x\mapsto \frac{2x-1}{x}$ and $x\mapsto \frac{1-x}{x}$ in Eqs. (4.6) and (4.7) are homographies of the real projective line $P^1({\mathbb {R}})$, that we denote respectively by $\alpha $ and $\beta $. They correspond to elements

$$\begin{aligned} A= \begin{pmatrix} 2 &{} -1 \\ 1 &{} 0 \end{pmatrix}, \quad B = \begin{pmatrix} -1 &{} 1 \\ 1 &{} 0 \end{pmatrix} \end{aligned}$$

(4.18)

in G, that satisfy

$$\begin{aligned} B^2= \begin{pmatrix} 2 &{} -1 \\ -1 &{} 1 \end{pmatrix}, \quad BA^{-1} = \begin{pmatrix} -1 &{} 1 \\ 0 &{} 1 \end{pmatrix}. \end{aligned}$$

(4.19)

This last matrix corresponds to $x\mapsto 1-x$.

Lemma 4.8

The matrices A and $B^2$ generate G.

Proof

Let

$$\begin{aligned} P= S^{-1}T^{-1}=\begin{pmatrix} 0 &{} 1 \\ -1 &{} 1 \end{pmatrix}. \end{aligned}$$

One has

$$\begin{aligned} P A P^{-1} = \begin{pmatrix} 1 &{} -1 \\ 0 &{} 1 \end{pmatrix}, \end{aligned}$$

(4.20)

and

$$\begin{aligned} P B^{2} P^{-1} = \begin{pmatrix} 3 &{} -1 \\ 1 &{} 0 \end{pmatrix} . \end{aligned}$$

(4.21)

Therefore, $PAP^{-1} = T^{-1}$ and $S=T^{-3} P B^{-2} P^{-1}$. Inverting these relations, we obtain

$$\begin{aligned} T = PA^{-1}P^{-1}; \quad S=PA^3B^{-2}P^{-1}. \end{aligned}$$

(4.22)

Let X be an arbitrary element of G. Since $Y=PXP^{-1}\in G$ and G is generated by S and T, the element Y is a word in S and T. In consequence, X is a word in $P^{-1}SP$ and $P^{-1}TP$, which in turn are words A and $B^2$. $\square $

It is possible to find explicit formulas for S and T in terms of A and $B^2$. Since $P=S^{-1}T^{-1}$, we deduce that $PSP^{-1}= S^{-1}T^{-1}STS$ and $PTP^{-1} = S^{-1}T^{-1}TTS = S^{-1}TS$. Hence, in virtue of (4.22),

$$\begin{aligned} S&= P^{-1}S^{-1}T^{-1}STS P\\&=(P^{-1} S^{-1}P)(P^{-1} T^{-1} P)(P^{-1} S P)(P^{-1} T P) (P^{-1} SP)\\&= B^2 A B^{-2}A^2 B^{-2} \end{aligned}$$

and

$$\begin{aligned} T&= P^{-1} S^{-1} T S P \\&= (P^{-1} S^{-1} P)(P^{-1} T P) (P^{-1} S P)\\&= B^2 A^{-1} B^{-2}. \end{aligned}$$

To finish our proof of Proposition 1.3, we remark that the orbit of 0 by the action of G on $P^1({\mathbb {R}})$ is ${\mathbb {Q}}\cup \{\infty \}$, where ${\mathbb {Q}}\cup \{\infty \}$ has been identified with $\{[p:q] \in P^1({\mathbb {R}}) \mid p,q\in {\mathbb {Z}}\}\subset P^1({\mathbb {R}})$. This is a consequence of Bezout’s identity: for every point $[p:q]\in P^1({\mathbb {R}})$ representing a reduced fraction $\frac{p}{q} \ne 0$ ($p,q \in {\mathbb {Z}}\setminus \{0\}$ and coprime), there are two integers x, y such that $xq - yp = 1$. Therefore

$$\begin{aligned}g'=\begin{pmatrix} x &{} p \\ y &{} q \end{pmatrix} \end{aligned}$$

is an element of G and $g'[0:1] = [p:q]$. The case $q=0$ is covered by

$$\begin{aligned}\begin{pmatrix} 0 &{} 1 \\ -1 &{} 0 \end{pmatrix} [0:1] = [1:0]. \end{aligned}$$

The extended Eqs. (4.6) and (4.7) are such that

1.
For all $x\in {\mathbb {R}}$, if $h(x) = 0$ then $h(\alpha ^{-1} x) = 0$ and $h(\beta ^{-1} x) = 0$;
2.
For all $x\in {\mathbb {R}}\setminus \{0\}$, if $h(x) = 0$ then $h(\alpha x) = 0$ and $h(\beta x) = 0$.

Since $h(1/2)=0$, the following lemma is the missing piece to establish that the extended h vanishes on ${\mathbb {Q}}$ (and hence the original h necessarily vanishes on $[0,1]\cap {\mathbb {Q}})$.

Lemma 4.9

For any $r\in {\mathbb {Q}}\setminus \{0\}$, there exists a finite sequence

$$\begin{aligned} w=(w_i)_{i=1}^n\in \{\alpha ,\beta ,\alpha ^{-1},\beta ^{-1}\}^n \end{aligned}$$

such that $r=w_n\circ \cdots \circ w_1(1/2)$ and, for all $i\in \{1,...,n\}$, the iterate $x_i:=w_i\circ \cdots \circ w_1(1/2)$ does not equal 0 or $\infty $.

Proof

Since the orbit in $P^1({\mathbb {R}})$ of 1/2 by the group of homographies generated by A and $B^2$ (i.e. G itself) contains the whole set of rational numbers ${\mathbb {Q}}$, there exists a w such that $r=w_n \circ \cdots \circ w_1(1/2)$, where each $w_i$ equals $\alpha $, $\beta $ or one of their inverses.

If some iterate equals 0 or $\infty $, the sequence w can be modified to avoid this. Let $i\in \{0,...,n\}$ be the largest index such that $x_i\in \{0,\infty \}$; in fact, $i<n$ because $r\ne 0,\infty $.

If $x_i = 0$, then $x_{i+1} \in \{1/2,1\}$ (the possibility $x_{i+1}=\infty $ is ruled out by the choice of i). In the case $x_{i+1}=1/2$, the equality $r=w_n\circ \cdots \circ w_{i+2}(1/2)$ holds, and when $x_{i+1}=1$, we have $r=w_n\circ \cdots \circ w_{i+2}\circ \beta (1/2)$.
If $x_i = \infty $, then $x_{i+1}\in \{2,-1\}$ (again, $x_{i+1}=0$ is ruled out). When $x_{i+1}=2$, we have $ r=w_n\circ \cdots \circ w_{i+2} \circ \beta \circ \alpha \circ \beta ^{-1}\circ \beta ^{-1}(1/2)$, and when $x_{i+1}=-1$, it also holds that $r=w_n\circ \cdots \circ w_{i+2} \circ \alpha \circ \alpha \circ \beta ^{-1}\circ \beta ^{-1}(1/2)$.

$\square $

Notes

Assumption A can be deduced from B if one identifies X with (X, X) through the diagonal map $E_X \rightarrow E_X\times E_X, \;x\mapsto (x,x)$ and then evaluates (1.9) at $Y=X$ and $p=\delta _{x_0}$, for any $x_0\in E_X$.
In fact, he supposes $u(1/2)=1$, but the argument works in general.

References

Aczél, J., Daróczy, Z.: On Measures of Information and Their Characterizations. Mathematics in Science and Engineering. Academic Press, New York (1975)
MATH Google Scholar
Baudot, P., Bennequin, D.: The homological nature of entropy. Entropy 17(5), 3253–3318 (2015)
Article MathSciNet Google Scholar
Daróczy, Z.: Generalized information functions. Inf. Control 16(1), 36–51 (1970)
Article MathSciNet Google Scholar
Havrda, J., Charvát, F.: Quantification method of classification processes. Concept of structural $ a $-entropy. Kybernetika 3(1), 30–35 (1967)
MathSciNet MATH Google Scholar
Kannappan, P., Ng, C.T.: Measurable solutions of functional equations related to information theory. Proc. Am. Math. Soc. 38(2), 303–310 (1973)
Article MathSciNet Google Scholar
Serre, J.: A Course in Arithmetic. Graduate Texts in Mathematics. Springer, Berlin (1973)
Book Google Scholar
Shannon, C.: A mathematical theory of communication. Bell Syst. Tech. J. 27(379–423), 623–656 (1948)
Article MathSciNet Google Scholar
Tsallis, C.: Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World. Springer, New York (2009)
MATH Google Scholar
Tverberg, H.: A new derivation of the information function. Math. Scand. 6, 297–298 (1958)
Article MathSciNet Google Scholar
Vigneaux, J.P.: Topology of Statistical Systems: A Cohomological Approach to Information Theory. Ph.D. Thesis, Université Paris Diderot (2019)

Download references

Acknowledgements

Open access funding provided by Projekt DEAL.

Author information

Authors and Affiliations

Institut de Mathématiques de Jussieu-Paris Rive Gauche (IMJ-PRG), Université de Paris, 8 place Aurélie Némours, 75013, Paris, France
Daniel Bennequin
Max-Planck-Institute for Mathematics in the Sciences, Inselstraße 22, 04103, Leipzig, Germany
Juan Pablo Vigneaux

Authors

Daniel Bennequin
View author publications
You can also search for this author in PubMed Google Scholar
Juan Pablo Vigneaux
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Juan Pablo Vigneaux.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article was written while the second author was a graduate student at the Université Paris Diderot.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bennequin, D., Vigneaux, J.P. A functional equation related to generalized entropies and the modular group. Aequat. Math. 94, 1201–1212 (2020). https://doi.org/10.1007/s00010-020-00717-2

Download citation

Received: 14 October 2019
Revised: 23 January 2020
Published: 02 March 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s00010-020-00717-2

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A functional equation related to generalized entropies and the modular group

Abstract

Similar content being viewed by others

Convexity Properties of Some Entropies (II)

Forward and Reverse Entropy Power Inequalities in Convex Geometry

Gamma Function and Its Functional Equations

1 Motivation and results

Theorem 1.1

Theorem 1.2

Theorem 1.3

Corollary 1.4

Proof

2 The modular group

3 Regularity: proof of Theorem 1.2

4 Symmetry: proof of Theorem 1.3

Theorem 4.1

Lemma 4.2

Proof

Lemma 4.3

Proof

Lemma 4.4

Proof

Lemma 4.5

Proof

Lemma 4.6

Proof

Lemma 4.7

Proof

Lemma 4.8

Proof

Lemma 4.9

Proof

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation