Quantifying the degree of average contraction of Collatz orbits



We here elaborate on a quantitative argument to support the validity of the Collatz conjecture, also known as the \((3x+1)\) or Syracuse conjecture. The analysis is structured as follows. First, three distinct fixed points are found for the third iterate of the Collatz map, which hence organise in a period 3 orbit of the original map. These are 1, 2 and 4, the elements which define the unique attracting cycle, as hypothesised by Collatz. To carry out the calculation we write the positive integers in modulo 8 (mod8), obtain a closed analytical form for the associated map and determine the transitions that yield contracting or expanding iterates in the original, infinite-dimensional, space of positive integers. Then, we consider a Markov chain which runs on the reduced space of mod8 congruence classes of integers. The transition probabilities of the Markov chain are computed from the deterministic map, by employing a measure that is invariant for the map itself. Working in this setting, we demonstrate that the stationary distribution sampled by the stochastic system induces a contracting behaviour for the orbits of the deterministic map on the original space of the positive integers. Sampling the equilibrium distribution on the congruence classes mod\(8^m\) for any m, which amounts to arbitrarily reducing the degree of imposed coarse graining, returns an identical conclusion.


Collatz conjecture Number theory Markov process Ergodic dynamical systems 

1 Introduction

The Collatz conjecture is named after Lothar Collatz, who first proposed it in 1937 [1]. The conjecture is also known as the \((3x+1)\) conjecture, the Ulam conjecture (after Stanislaw Ulam), Kakutani’s problem (after Shizuo Kakutani), the Thwaites conjecture (after Sir Bryan Thwaites), Hasse’s algorithm (after Helmut Hasse), or the Syracuse problem [1]. This can be formulated as an innocent problem of arithmetics. The beauty of the conjecture indeed emanates from its apparent, tantalising, simplicity, which however hides formidable challenges for those who attempt to grasp its deep-rooted essence. To date the conjecture is still unsolved, despite mathematicians strived hard to confer it the rigour of an indisputable proof. The problem can be stated as follow. Take any positive integer n. If n is even, divide it by 2 to get n / 2. If n is odd, multiply it by 3 and add 1 to obtain \(3n + 1\). In formulae:
$$\begin{aligned} \forall n\in \mathbb {N}\quad T(n)= {\left\{ \begin{array}{ll} \frac{n}{2}&{}\quad \text {if}\, n\, \text {is even}\\ 3n+1&{}\quad \text {if} \,n \,\text {is odd}. \end{array}\right. } \end{aligned}$$
Repeating the process iteratively, the map is believed to converge to a period-3 orbit formed by the triad \(\{1,2,4\}\). Equivalently, the conjecture states that the Collatz map will always reach 1, no matter what integer number one starts with. Numerical experiments have confirmed the validity of the conjecture for extraordinarily large values of the starting integer n [2].

In this paper, we provide a novel argument to support the validity of the Collatz conjecture. To anticipate our findings, we shall demonstrate that the third iterate \(T^{\circ 3}\) of the Collatz map admits three fixed points, 1, 2 and 4. These latter elements define the, supposedly unique, attracting cycle conjectured by Collatz. The third iterate map is naturally defined on the mod8 congruence classes of positive integers. We thus quantify the factor of relative compression or expansion, as follows the application of \(T^{\circ 3}\), on each of the eight congruence classes obtained under the mod8 operation. In the second part of the paper, we show that orbits are on average bound to asymptotically shrink in size so heading towards the deputed equilibrium. We will further enhance the resolution of the measure by reducing to an arbitrary extent the degree of imposed coarse graining, i.e. working on the the congruence classes mod\(8^m\) for any chosen m. Working in this generalised setting, we will prove that the average Collatz dynamics is contracting, for m large as sought, namely shrinking the congruence classes arbitrarily close to the singletons corresponding to each integer.

Markov processes based on congruence classes invariant under application of T have been previously considered in [3, 4] and revisited by Lagarias in his comprehensive survey on the \((3x+1)\) problem [1]. It is however the combined usage of (i) the third iterate map \(T^{\circ 3}\), (ii) the representation of numbers in mod8, (iii) the idea of employing a Markov chain constructed from \(T^{\circ 3}\) via a suitably defined measure, that allows us to draw a rigorous bound for the contraction factor, which is not just heuristically guessed.

To guide the reader through the text we shall hereafter provide a schematic outline of the main steps involved in the analysis, by making explicit reference to specific key results.
  • We will begin by defining the third iterate of the Collatz map hereby named \(S=T^{\circ 3}\).

  • We will determine the action of S on integers expressed in mod8. We will obtain class-dependent, \(\mathscr {B}(i,8) \, (i=0,\ldots ,7)\), expansion/contraction factors that exemplify the action of S, see Eq. (3). Working in this setting we will also show that 1, 2, 4 are the only fixed points of the deterministic map S. The subsequent analysis is targeted to showing that the trajectories of S are bound to converge on average to one of the above fixed points.

  • To this end we first introduce a finite states Markov chain which runs on the eight congruence classes \(\mathscr {B}(i,8) \, (i=0,\ldots ,7)\). The transition probabilities are given by Eq. (8) and have been obtained using the S-invariant measure \(\mu _{inv}\) on the classes \(\mathscr {B}(j,8^m)\), \(m\ge 1\) and \(j=0,\ldots ,8^m-1\).

  • The measure \(\mu _{inv}\) is defined by Eq. (12) (or equivalently Eq. (14)). The invariance of the measure under S is proved in Theorem 1.

  • Since the transition probabilities are computed from the S-invariant measure, it is possible to draw conclusion on the iterates of S (namely its restriction in mod8) by iterating forward the Markov process. This observation follows from a straightforward application of the Chapman–Kolmogorov equation, as discussed in Proposition 4. The explicit form of the stochastic matrix \(Q^*\) that characterises the introduced Markov chain is given in Proposition 5. The stochastic chain does not account for the specificity of 1, 2, 4, the equilibria of S. It will hence allow us to elaborate on the out-of-equilibrium dynamics of S, prior (possible) convergence to the asymptotic Collatz equilibrium.

  • The stationary distribution of the Markov chain is computed and given by formula (29). Recall that by iterating forward the Markov chain one can inspect the equilibrium dynamics of S, in its mod8 representation, see Proposition 4.

  • By using the expansion/contraction factors associated to each of the classes \(\mathscr {B}(i,8) (i=0,\ldots ,7)\) one can show that the deterministic trajectories are on average contracting. This is substantiated by formula (30).

  • The analysis is generalised by working on the congruence classes mod\(8^m\), for any m. By operating in this setting, we will proof that the average Collatz dynamics is contracting, for arbitrarily large m, i.e. shrinking the size of the congruence classes as sought. Remarkably, the estimated upper bound for the contraction factor is shown to be independent on m.

2 From the deterministic map to a stochastic framework

The aim of this section is to introduce the tools used to derive our conclusion, through three intermediate steps, that we here outline for convenience. In Sect. 2.1 we will define and analyse the third iterate of the deterministic Collatz map. Then, in Sects. 2.2, 2.3 and 2.4 we will introduce a finite states Markov process, whose transition probabilities come from the aforementioned third iterate map. To construct the Markov chain we employ an invariant measure of the deterministic map. In Sect. 3 we will study the stationary distribution of the proposed Markov chain and infer robust constraints for the Collatz (deterministic) dynamics. In particular, Collatz orbits are on average contracting. Finally, we will expand on previous results by reformulating the dynamics on the mod\(8^m\) classes, \(\forall m\), and prove the average Collatz dynamics to be contracting at any level of imposed coarse graining.

2.1 The third iterate of the Collatz map

We begin by remarking that \(\{1,2,4\}\) is a period-3 orbit of the Collatz map. It is hence quite natural to operate with the third iterate of the original map. Working in this context, the elements of the Collatz cycle should emerge as distinct fixed points of the third iterate map. Furthermore, we progress in the analysis with a mod8 representation of the natural numbers, a choice that makes it possible to cast the sought map in a rather compact form. In practical terms, this amounts to organise the visited numbers in eight different congruence classes, \(\mathscr {B}(i,8)\) for \(i\in \{0,\ldots , 7\}\), each containing the positive integers that yield the same remainder, after performing an Euclidean division by 8. Mathematically, the \(\mathscr {B}(i,8)\) class is defined as:
$$\begin{aligned} \mathscr {B}(i,8):=\{n\in \mathbb {N}:\exists m\in \mathbb {N}\cup \{0\},\; n=i+8m\}. \end{aligned}$$
For the sake of simplicity, we term S the third iterate of the Collatz map, namely \(S:=T^{\circ 3}\). The following Proposition makes explicit the action of S on a generic positive integer n. The outcome depends on the specific mod8 class, n belongs to.

Proposition 1

Let n be any positive integer belonging to \(\mathscr {B}(i,8)\) for some \(i\in \{0,\ldots ,7\}\), then the third iterate map S is explicitly given by
$$\begin{aligned} \forall n\in \mathbb {N}\quad S(n)= {\left\{ \begin{array}{ll} \frac{n}{8}&{}\quad \text {if}\quad \, n\in \mathscr {B}(0,8)\\ \frac{6n+2}{8}&{}\quad \text {if}\quad \, n\in \mathscr {B}(1,8)\\ \frac{6n+4}{8}&{}\quad \text {if}\quad \, n\in \mathscr {B}(2,8)\\ \frac{36n+20}{8}&{}\quad \text {if}\quad \, n\in \mathscr {B}(3,8)\\ \frac{6n+8}{8}&{}\quad \text {if}\quad \, n\in \mathscr {B}(4,8)\\ \frac{6n+2}{8}&{}\quad \text {if}\quad \, n\in \mathscr {B}(5,8)\\ \frac{6n+4}{8}&{}\quad \text {if}\quad \, n\in \mathscr {B}(6,8)\\ \frac{36n+20}{8}&{}\quad \text {if}\quad \, n\in \mathscr {B}(7,8). \end{array}\right. } \end{aligned}$$

One can easily prove that the fixed points of S are 1, 2 and 4 corresponding to the 3-cycle \(\{1,2,4\}\) of the original Collatz map. This is achieved by solving the fixed point equation \(S(n)=n\) in the interval of interest \(n \ge 1\). It can be straightforwardly proved that no additional fixed points exist for the map S(n).

Notice that Eq. (3) can be cast in a more compact form as follows
$$\begin{aligned} S(n)=\frac{m_i n+r_i}{8}\quad \text { if }\quad n\in \mathscr {B}(i,8) \quad (i=0,\ldots ,7), \end{aligned}$$
where the integers \((m_i)_{0\le i\le 7}\) and \((r_i)_{0\le i\le 7}\) are given by
$$\begin{aligned} m_0= & {} 1,\quad m_1=m_2=m_4=m_5=m_6=6\quad \text { and }\quad m_3=m_7=36 \end{aligned}$$
$$\begin{aligned} r_0= & {} 0,\quad r_1=r_5=2,\quad r_2=r_6=4,\quad r_4=8\quad \text { and }\quad r_3=r_7=20. \end{aligned}$$
As a side remark, we observe that S is thus in the form of a generalised Collatz map [3]. In the following we will need the explicit value of \(x_i=S(i)={(m_i i+r_i)}/{8}\) for \(i=0,\ldots ,7\), that is
$$\begin{aligned} x_0=0,\quad x_1=1,\quad x_2=2,\quad x_3=16,\quad x_4=4,\quad x_5=4,\quad x_6=5\quad \text { and }\quad x_7=34. \end{aligned}$$

2.2 A finite states Markov process

The (finite states) Markov process that we are going to introduce considers the aforementioned congruence classes, \(\mathscr {B}(i,8)\), as a finite alphabet. The transition probabilities among different states follow the deterministic map S(n), provided one works with a suitable probability space \((\mathbb {N},\mu _{inv})\), for some S-invariant measure\(\mu _{inv}\), that we will introduce hereafter. More precisely, for any given pair of classes \(\mathscr {B}(i,8)\)\((i=0,\ldots ,7)\) and \(\mathscr {B}(j,8)\)\((j=0,\ldots ,7)\) the probability1\(q^*_{ij}\) of being initially in \(\mathscr {B}(i,8)\) and then land in \(\mathscr {B}(j,8)\), that is the conditional probability \(P[S(x)\in \mathscr {B}(j,8)| x\in \mathscr {B}(i,8)]\), is given by:
$$\begin{aligned} q^*_{ij}:=\frac{\mu _{inv}[\mathscr {B}(i,8)\cap S^{-1}\mathscr {B}(j,8)]}{\mu _{inv}[\mathscr {B}(i,8)]}\quad (i,j=0,\ldots ,7). \end{aligned}$$
To compute the above transition probabilities, one needs to explicitly determine \(S^{-1}\mathscr {B}(j,8)\), \(j=0,\ldots , 7\). To gather this information we start with a preliminary remark:

Remark 1

(On the solution of congruence linear equations) Let us recall a basic fact of congruence linear equations; given integers a, b and n, the equation
$$\begin{aligned} ax\equiv b \quad \textit{mod n}, \end{aligned}$$
can be solved if and only if \(d=gcd(a,n)\) (gcd stand for the greatest common divisor) is a divisor of b (that is b can be divided by d), in this case the number of distinct solutions is given by d.

We are now in a position to prove the following result:

Proposition 2

Let \(j=0,\ldots , 7\), then \(S^{-1}\mathscr {B}(j,8)\) is the union of disjoint congruence classes mod64, \(\mathscr {B}(l_j,64)\), where the indexes \(l_j\) depend on the mod8 congruence class j.

In explicit form:
$$\begin{aligned} S^{-1}\mathscr {B}(0,8)= & {} \mathscr {B}(0,64)\cup \mathscr {B}(10,64)\cup \mathscr {B}(42,64)\cup \mathscr {B}(3,64)\cup \mathscr {B}(19,64)\nonumber \\&\cup \, \mathscr {B}(35,64)\cup \mathscr {B}(51,64)\cup \mathscr {B}(20,64)\cup \mathscr {B}(52,64) \nonumber \\&\cup \, \mathscr {B}(21,64)\cup \mathscr {B}(53,64)\nonumber \\ S^{-1}\mathscr {B}(1,8)= & {} \mathscr {B} (1,64)\cup \mathscr {B}(33,64)\cup \mathscr {B}(22,64)\cup \mathscr {B}(54,64)\cup \mathscr {B}(8,64)\nonumber \\ S^{-1}\mathscr {B}(2,8)= & {} \mathscr {B} (2,64)\cup \mathscr {B}(34,64)\cup \mathscr {B}(12,64)\cup \mathscr {B}(44,64)\cup \mathscr {B}(13,64)\nonumber \\&\cup \, \mathscr {B}(45,64)\cup \mathscr {B}(7,64)\cup \mathscr {B}(23,64)\cup \mathscr {B}(39,64) \nonumber \\&\cup \, \mathscr {B}(55,64)\cup \mathscr {B}(16,64)\nonumber \\ S^{-1}\mathscr {B}(3,8)= & {} \mathscr {B}(25,64)\cup \mathscr {B}(57,64)\cup \mathscr {B}(14,64)\cup \mathscr {B}(46,64)\cup \mathscr {B}(24,64)\nonumber \\ S^{-1}\mathscr {B}(4,8)= & {} \mathscr {B}(26,64)\cup \mathscr {B}(58,64)\cup \mathscr {B}(11,64)\cup \mathscr {B}(27,64)\cup \mathscr {B}(43,64)\nonumber \\&\cup \, \mathscr {B}(59,64)\cup \mathscr {B} (4,64)\cup \mathscr {B}(36,64)\cup \mathscr {B}(5,64) \nonumber \\&\cup \, \mathscr {B}(37,64)\cup \mathscr {B}(32,64)\nonumber \\ S^{-1}\mathscr {B}(5,8)= & {} \mathscr {B}(17,64)\cup \mathscr {B}(49,64)\cup \mathscr {B}(6,64)\cup \mathscr {B}(38,64)\cup \mathscr {B}(40,64)\nonumber \\ S^{-1}\mathscr {B}(6,8)= & {} \mathscr {B}(18,64)\cup \mathscr {B}(50,64)\cup \mathscr {B}(28,64)\cup \mathscr {B}(60,64)\cup \mathscr {B}(29,64)\nonumber \\&\cup \, \mathscr {B}(61,64)\cup \mathscr {B}(15,64)\cup \mathscr {B}(31,64)\cup \mathscr {B}(47,64) \nonumber \\&\cup \, \mathscr {B}(63,64)\cup \mathscr {B}(48,64)\nonumber \\ S^{-1}\mathscr {B}(7,8)= & {} \mathscr {B}(9,64)\cup \mathscr {B}(41,64)\cup \mathscr {B}(30,64)\cup \mathscr {B}(62,64)\cup \mathscr {B}(56,64). \end{aligned}$$


Let \(n\in \mathscr {B}(l,64)\) for some \(l=0,\ldots ,63\), that is it exists \(k\in \mathbb {N}\cup \{0\}\) such that \(n=l+64k\). Let \(l\equiv i\)mod8, namely \(l=i+8h\) for some \(i=0,\ldots ,7\) and \(h=0,\ldots ,7\).

We can then evaluate S(n) using Eq. (4):
$$\begin{aligned} S(n)= & {} \frac{m_i n+r_i}{8} =\frac{m_i (l+64k)+r_i}{8}=\frac{m_i l+r_i}{8}+{8m_ik}\nonumber \\= & {} \frac{m_i (i+8h)+r_i}{8}+ {8m_ik}=x_i+m_ih+{8m_ik}, \end{aligned}$$
where we used the definition of \(x_i\) in the rightmost step. Finally \(S(n)\in \mathscr {B}(j,8)\) for some \(j=0,\ldots ,7\), if and only if \(S(n)\equiv j\)mod8, that is
$$\begin{aligned} x_i+m_ih\equiv j \quad \textit{mod8}, \end{aligned}$$
or equivalently
$$\begin{aligned} m_ih\equiv j - x_i\quad \textit{mod8}. \end{aligned}$$
As stated in Remark 1, Eq. (10) can be solved if and only if \(d_i=gcd(m_i,8)\) is a divisor of \(j - x_i\).
Let us turn to compute \(d_i\). From the definitions Eq. (56) we readily get
$$\begin{aligned}&d_0=gcd(1,8)=1,\quad d_1=d_2=d_4=d_5=d_6=gcd(6,8)=2\quad \text { and }\nonumber \\&\qquad \quad d_3=d_7=gcd(36,8)=4. \end{aligned}$$
Consider first the case \(j=0\). Because \(d_0=1\), Eq. (10) has always one solution for \(i=0\), given by \(h=0\). Thus \(l=i+8h=0\). For \(i=1\) (\(d_1=2\) and \(x_1=1\)) Eq. (10) can be solved if and only if \(j-1\) can be divided by 2. Hence, it has no solution when \(j=0\). For \(i=2\) (\(d_2=2\) and \(x_2=2\)) Eq. (10) can be solved because \(j-x_2=0-2\) is divisible by 2. There are in particular 2 solutions, \(h=1\) and \(h=5\), which return \(l=i+8h=10\) and \(l=42\). For \(i=3\) (\(d_3=4\) and \(x_3=16\)), Eq. (10) can be solved if and only if \(j-16\) can be divided by 4. This holds true for \(j=0\). Four solutions are found which correspond to \(h=0,2,4,6\), yielding \(l=3,19,35,51\). For \(i=4\) (\(d_4=2\) and \(x_4=4\)), Eq. (10) can be solved if and only if \(j-4\) can be divided by 2, and this is true if \(j=0\). The two obtained solutions are \(h=2\) and \(h=6\) giving in turn \(l=20\) and \(l=52\). The same is true for \(i=5\) (\(d_5=2\) and \(x_5=4\)). In this case one obtains \(l=21\) and \(l=53\). For \(i=6\) (\(d_6=2\) and \(x_6=5\)), Eq. (10) can be solved if and only if \(j-5\) can be divided by 2 and this is impossible for \(j=0\). Finally for \(i=7\) (\(d_7=4\) and \(x_7=34\)), Eq. (10) can be solved if and only if \(j-34\) can be divided by 4 and this condition is not met for \(j=0\).

In conclusion we showed that \(S^{-1}\mathscr {B}(0,8)\) is the union of \(\mathscr {B}(0,64)\cup \mathscr {B}(10,64)\cup \mathscr {B}(42,64)\cup \mathscr {B}(20,64)\cup \mathscr {B}(52,64)\) and \(\mathscr {B}(3,64)\cup \mathscr {B}(19,64)\cup \mathscr {B}(35,64)\cup \mathscr {B}(51,64)\cup \mathscr {B}(21,64)\cup \mathscr {B}(53,64)\). The first set is composed by 5 classes corresponding to even l while the latter group is made by 6 classes with odd l.

Let us now consider \(j=1\). For \(i=0\) there is only one solution \(h=1\) and thus \(l=8\). For \(i=1\) there are two solutions, since \(j-1\) is even. These are \(h=0,4\), hence \(l=1,33\). For \(i=6\) two solutions are found because \(j-5\) can be divided by 2. These are \(h=2,6\), yielding \(l=22,54\). For the remaining cases \(i=2,3,4,5,7\) no solutions are possible. Summing up, \(S^{-1}\mathscr {B}(1,8)\) is the union of three classes with even l, \(\mathscr {B}(22,64)\cup \mathscr {B}(54,64)\cup \mathscr {B}(8,64)\), and two classes with odd l, \(\mathscr {B} (1,64)\cup \mathscr {B}(33,64)\).

The remaining cases can be handled similarly and are not discussed here in details.

To go one step further and explicitly compute the transition probabilities given by Proposition 2 we introduce a S-invariant (probability) measure. As we will clarify in the following the measure is invariant under S on the family of generalised congruence classes \(B(j,8^m)\).

2.3 A S-invariant probability measure

For any fixed integer \(m\ge 1\) we introduce a partition of \(\mathbb {N}\) into disjoint congruence classes, \(\mathscr {B}(i,8^m)\), \(i\in \{0,\ldots , 8^m-1\}\) and we will use them to define a (1-parameter family of) invariant measures \(\mu ^{(m)}_{inv}\). As we will clarify hereafter the introduced measure depends on the index m, which identifies the class of pertinence.

Let n be a positive integer and assume it belongs to \(\mathscr {B}(i,8^m)\) for some i. Hence, \(n=i+k\,8^m\) for some integer \(k\ge 0\) and \(i\in \{0,\ldots , 8^m-1\}\). Then there exists a unique string \(s_0,\ldots ,s_{m-1}\in \{0,\ldots ,7\}\) such that \(i=s_{m-1}8^{m-1}+\cdots +s_18+s_0\). Based on the above we define the measure \(\mu ^{(m)}_{inv}\) of the integer n to be:
$$\begin{aligned} \mu ^{(m)}_{inv}(n)=\frac{1}{2^{k+1}}\frac{1}{8^{m-1}}\nu (s_0), \end{aligned}$$
where the factor \(1/2^{k+1}\) is related to the considered partition2 while the second factor quantifies the probability that \(s_1,\ldots ,s_{m-1}\) take any of the symbolic entries \(\{0,\ldots ,7\}\) in the expression for i. These latter probabilities are assumed identical (i.e. equal 1 / 8) for all symbols.
The quantities \(\nu (s_0)\) are given by
$$\begin{aligned} \nu (s_0)=\frac{1}{6}\quad \text { if}\quad s_0=0,2,4,6\quad \mathrm{and}\quad \nu (s_0)=\frac{1}{12}\quad \text { if}\quad s_0=1,3,5,7. \end{aligned}$$
The measure of any set made by integers is the sum of the measures of the integers forming the set. The introduced measure is therefore additive. Based on the above we can straightforwardly measure any class \(\mathscr {B}(i,8^m)\). Following Eq. (12) the measure of \(\mathscr {B}(i,8^m)\) reads:
$$\begin{aligned} \mu ^{(m)}_{inv}(\mathscr {B}(i,8^m))=\sum _{n\in \mathscr {B}(i,8^m)}\mu ^{(m)}_{inv}(n)=\sum _{k\ge 0 }\frac{1}{2^{k+1}}\frac{\nu (s_0)}{8^{m-1}}=\frac{\nu (s_0)}{8^{m-1}}, \end{aligned}$$
where \(i=s_0+s_1 8+\cdots +s_{m-1}8^{m-1}\) for \(s_0,\ldots ,s_{m-1}\in \{0,\ldots ,7\}\), hence
$$\begin{aligned} \mu ^{(m)}_{inv}(\mathscr {B}(i,8^m))= & {} \frac{1}{6}\frac{1}{8^{m-1}}\quad \text { if}\quad s_0=0,2,4,6\quad \text {and} \nonumber \\ \mu ^{(m)}_{inv}(\mathscr {B}(i,8^m))= & {} \frac{1}{12}\frac{1}{8^{m-1}}\quad \text { if}\quad s_0=1,3,5,7. \end{aligned}$$
Clearly \(\mathbb {N}=\bigcup _{s_0,\ldots ,s_{m-1}}\mathscr {B}(s_0+s_1 8+\cdots +s_{m-1}8^{m-1},8^m)\) and thus
$$\begin{aligned} \mu ^{(m)}_{inv}(\mathbb {N})= & {} \sum _{s_0,\ldots ,s_{m-1}} \mu ^{(m)}_{inv}(\mathscr {B}(s_0+s_1 8+\cdots +s_{m-1}8^{m-1},8^m)) =\sum _{s_0,\ldots ,s_{m-1}}\nu (s_0)\frac{1}{8^{m-1}}\nonumber \\= & {} \sum _{s_0}\nu (s_0)\frac{1}{8^{m-1}}8^{m-1}=1, \end{aligned}$$
that is \(\mu ^{(m)}_{inv}\) is a probability measure for any \(m\ge 1\).

Remark 2

Notice that the invariant measure \(\mu ^{(m)}_{inv}\) can be equivalently defined working at the level of classes. One can in fact define:
$$\begin{aligned} \mu ^{(m)}_{inv}(\mathscr {B}(i,8^m)) = \frac{1}{8^{m-1}} \nu (i), \end{aligned}$$
$$\begin{aligned} \nu (i)=\frac{1}{6}\quad \text {if}\quad i (mod8) = 0,2,4,6\quad \text {and}\quad \nu (i)=\frac{1}{12}\quad \text { if}\quad i (mod 8)=1,3,5,7. \end{aligned}$$
$$\begin{aligned} \mu ^{(m)}_{inv}(\mathbb {N})=\sum _{i=0}^{8^m-1} \mu ^{(m)}_{inv}(\mathscr {B}(i,8^m))=\sum _{\sigma =0}^7\nu (\sigma )=1, \end{aligned}$$

One can prove that the above introduced measure is invariant under S for all sets made by any finite intersection and union of the congruence classes defined above. This is established in the following Theorem.

Theorem 1

For any \(m\ge 1\) and for all \(j=0,\ldots ,8^m-1\) we have
$$\begin{aligned} \mu _{inv}[S^{-1}\mathscr {B}(j,8^m)]=\mu _{inv} [\mathscr {B}(j,8^m)]. \end{aligned}$$

The proof of the above theorem relies on the following Proposition.

Proposition 3

For any \(m\ge 1\) and for all \(j=0,\ldots ,8^m-1\) we have
$$\begin{aligned} S^{-1}\mathscr {B}(j,8^m)=A^{(m)}_e(j)\cup A^{(m)}_o(j), \end{aligned}$$
where \(A_e(j)\) is the union of disjoint classes \(\mathscr {B}(l,8^{m+1})\) with l even and \(A^{(m)}_o(j)\) is the union of disjoint classes \(\mathscr {B}(l,8^{m+1})\) with l odd. Moreover if j is even then \(A^{(m)}_e(j)\) contains five elements and \(A^{(m)}_o(j)\) six elements, while if j is odd then \(A^{(m)}_e(j)\) contains three elements and \(A^{(m)}_o(j)\) two elements.


Observe that Eq. (17) holds true for \(m=1\) by Proposition 2. Let us assume it is true for all \(k\le m-1\) and prove it for \(k=m\), namely we have to prove that \(S^{-1}\mathscr {B}(j,8^m)\) is the disjoint union of classes \(\mathscr {B}(l,8^{m+1})\) and more precisely \(\#A_e^{(m)}(j)=5\) if j is even and 3 if j is odd, while \(\#A_o^{(m)}(j)=6\) if j is even and 2 if j is odd.

Let thus \(n\in \mathscr {B}(l,8^{m+1})\) for some \(l=0,\ldots ,8^{m+1}-1\), that is \(n=l+8^{m+1}k\) for some positive integer k. Let \(l\equiv i\)mod8, \(l=i+8h\), \(h=0,\ldots ,8^m-1\). From the definition (3) we can compute
$$\begin{aligned} S(n)=\frac{m_in+r_i}{8}=\frac{m_i l+r_i}{8}+m_i8^{m}k=x_i+m_i h+m_i8^{m}k, \end{aligned}$$
hence \(S(n)\in \mathscr {B}(j,8^{m})\) for some \(j=0,\ldots ,8^{m}-1\) if and only if \(S(n)\equiv j\)mod\(8^m\), namely
$$\begin{aligned} m_ih\equiv j-x_i \quad \textit{mod} 8^m. \end{aligned}$$
This equation can be solved if and only if \(d_i^{(m)}=gcd(m_i,8^m)\) is a divisor of \(j-x_i\). Let us observe firstly that for all \(m\ge 1\), \(d_i^{(m)}=d_i\) where \(d_i\) have been defined above Eq. (11).

Let us now determine \(j_1=0,\ldots , 8^{m-1}\) such that \(j \equiv j_1\) mod \(8^{m-1}\). Observing that \(j-x_i=j_1-x_i+q8^{m-1}\), for some integer q, we can conclude that \(d_i\) divides \(j-x_i\) if and only if it divides \(j_1-x_i\), hence using the induction hypothesis we can conclude that \(m_ih\equiv j-x_i\)mod\(8^m\) has solutions if and only if the same equation for \(m-1\) has solutions, and moreover they have the same number of solutions.

This implies that \(\#A_e^{(m+1)}(j)=\#A_e^{(m)}(j_1)\) and \(\#A_e^{(m+1)}(j)=\#A_e^{(m)}(j_1)\), because the mod \(8^{m-1}\) operation doesn’t change the parity of j. The claim is then proved using the results of Proposition 2.

We can now prove Theorem 1.


Let \(m\ge 1\) and \(j=0,\ldots ,8^{m}-1\), then thanks to Proposition 3 we have
$$\begin{aligned} \mu _{inv}[S^{-1}\mathscr {B}(j,8^m)]=\mu _{inv} [A^{(m)}_e(j)]+\mu _{inv}[A^{(m)}_o(j)], \end{aligned}$$
being the classes disjoint. Observe that all classes \(\mathscr {B}(l,8^{m+1})\in A^{(m)}_e(j)\) have the same measure given by \(1/6\times 1/8^m\), and the same is true for classes in \(A^{(m)}_o(j)\), with measure \(1/12\times 1/8^m\), thus
$$\begin{aligned} \mu _{inv}[S^{-1}\mathscr {B}(j,8^m)]=\frac{1}{8^m}\frac{1}{6}\#A^{(m)}_e(j)+\frac{1}{8^m}\frac{1}{12}\#A^{(m)}_o(j). \end{aligned}$$
If j is even we get:
$$\begin{aligned} \mu _{inv}[S^{-1}\mathscr {B}(j,8^m)]=\frac{1}{8^m} \frac{1}{6}5+\frac{1}{8^m}\frac{1}{12}6=\frac{1}{8^{m-1}}\frac{1}{6}=\mu _{inv} [\mathscr {B}(j,8^m)], \end{aligned}$$
If j is odd we get:
$$\begin{aligned} \mu _{inv}[S^{-1}\mathscr {B}(j,8^m)]=\frac{1}{8^m} \frac{1}{6}3+\frac{1}{8^m}\frac{1}{12}2=\frac{1}{8^{m-1}}\frac{1}{12} =\mu _{inv}[\mathscr {B}(j,8^m)]. \end{aligned}$$

Remark 3

Given a positive integer \(m^*\) and the measure \(\mu _{inv}^{(m^*)}\), all measures \(\mu _{inv}^{(m)}\), \(\forall m<m^*\) are push-forward measures of \(\mu _{inv}^{(m^*)}\): \(\mu _{inv}^{(m^*-1)}(\cdot )\! :=\! \mu _{inv}^{(m^*)} (S^{-1}(\cdot ))\). In fact \(\mu _{inv}^{(m^*-1)}(\mathscr {B}(i,8^{m^*-1})) \!=\! \mu _{inv}^{(m^*)} (S^{-1} \mathscr {B}(i,8^{m^*-1})) \!=\! \mu _{inv}^{(m^{*})} (\bigcup _j \mathscr {B}(j,8^{(m^{*})})) = \sum _j \mu _{inv}^{(m^*)} (\mathscr {B}(j,8^{m^*}))\) where the last sum extends over the classes \(8^{m^*}\) that constitute the image of \(\mathscr {B}(i,8^{m^*-1})\) via application of \(S^{-1} (\cdot )\). Based on Proposition 3 it readily follows that \(\mu _{inv}^{(m^*-1)}(\mathscr {B}(i,8^{m^*-1})) = \nu (i)/8^{m^*-1}\). The reasoning can be further iterated to all \(m<m^*\), so recovering the measure of the classes as introduced above.

Remark 4

Writing the singleton set \(\{n\}\) as \(\bigcap _{m\ge m_0} \mathscr {B}(n,8^m)\), where \(m_0\) is the smallest integer such that \(n\le 8^{m_0}\), one gets \(\mu ^{(\infty )}_{inv}(n)=\lim _{m\rightarrow \infty }\nu (s_0)2^{-(k+1)}2^{-(m-1)}=0\). Integers are hence associated to a trivial (zero) measure. We thus have the invariance also in the limiting case \(m=\infty \), even if in a trivial form \(\mu ^{(\infty )}_{inv}(S^{-1}\{n\})=\mu ^{(\infty )}_{inv}(\{\text {a finite set of integers}\})=0\) which equals \(\mu ^{(\infty )}_{inv}(\{n\})=0\). Notice however that \(\mu ^{(\infty )}\) is not \(\sigma \)-additive, and so, strictly speaking, we cannot refer to it as to a measure

In the following we will be be interested in studying the orbit of the third iterate of the Collatz map, namely \(S^{\circ k}\) for \(k\in \mathbb {N}\). We aim in particular at characterising the quasi-stationary dynamics, i.e. the out of equilibrium dynamics of the system prior possible absorption to the equilibrium fixed point, as identified earlier. The transition probabilities \(q_{ij}^*\) given by (8) have been computed using S, i.e. the first iterate of the map. A natural question that arise is how these latter quantities relate to the transition probabilities of the stochastic process defined through \(S^{\circ k}\), for a generic \(k\in \mathbb {N}\). This is established in the following proposition.

Proposition 4

$$\begin{aligned} q^{(k)}_{ij}:=\frac{\mu _{inv}[\mathscr {B}(i,8)\cap S^{\circ -k}\mathscr {B}(j,8)]}{\mu _{inv}[\mathscr {B}(i,8)]}\quad (i,j=0,\ldots ,7), \end{aligned}$$
be the transition probability of the Markov process built from \(S^{\circ k}\), for \(k\in \mathbb {N}\). Then
$$\begin{aligned} q^{(k)}_{ij}=[(q^{*})^k]_{ij}\quad (i,j=0,\ldots ,7). \end{aligned}$$


The proof is straightforward and relies on the Chapman–Kolmogorov equation. Let \(p_{k,h}(j|i)\) be the probability to be in \(\mathscr {B}(j,8)\) after k steps, assuming that the system is in \(\mathscr {B}(i,8)\) after \(0\le h<k\) steps. Then
$$\begin{aligned} p_{k, h}(j|i)=\sum _{m_{h+1},\ldots ,m_{k-1}}p_{k, k-1}(j|m_{k-1})p_{k-1, k-2}(m_{k-1}|m_{k-2})\ldots p_{h+1, h}(m_{h+1}|i). \end{aligned}$$
Let us observe that
$$\begin{aligned} p_{l+1, l}(j|i)= & {} \frac{\mu _{inv}[S^{\circ -(l+1)}\mathscr {B}(j,8)\cap S^{\circ -l}\mathscr {B}(i,8)]}{\mu _{inv}[S^{\circ -l}\mathscr {B}(i,8)]}\nonumber \\= & {} \frac{\mu _{inv}[S^{\circ -l}(S^{\circ -1}\mathscr {B}(j,8)\cap \mathscr {B}(i,8))]}{\mu _{inv}[S^{\circ -l}\mathscr {B}(i,8)]}, \end{aligned}$$
hence using the invariance
$$\begin{aligned} p_{l+1, l}(j|i)=\frac{\mu _{inv}[S^{\circ -1}\mathscr {B}(j,8)\cap \mathscr {B}(i,8)]}{\mu _{inv}[\mathscr {B}(i,8)]}=q^*_{ij}. \end{aligned}$$
Because \(p_{k, 0}(j|i)=q^{(k)}_{ij}\) we obtain:
$$\begin{aligned} q^{(k)}_{ij}=\sum _{m_1,\ldots ,m_{k-1}}q^*_{m_{k-1} j} q^{*}_{m_{k-2} m_{k-1}}\ldots q^*_{i m_{1}}=[(q^*)^k]_{ij}. \end{aligned}$$

The consequences of the above Proposition are of paramount importance for the forthcoming discussion. In particular, it is possible to draw conclusion of the dynamics of S (and hence on its restriction in mod8) by iterating the Markov process defined by the transition probabilities \(q^*_{ij}\). We recall once again that we are here interested in shedding light onto the out-of-equilibrium dynamics of S before the deterministic trajectories reach their asymptotic attractor. As we shall see, by exploiting Proposition 4 it will be possible to deduce effective constraints that are to be matched by the deterministic application.

2.4 Computing the transition probabilities

We are now in a position to explicitly compute the transition probabilities \(q^*_{ij}\), formally given by Eq. (8). The latter transition probabilities define the entries of the transition matrix \(Q^*\) which is entirely specified by the following Proposition.

Proposition 5

The matrix \(Q^*\) is stochastic. The entries \(q^*_{ij}\), with i and j in \(\{0,\ldots ,7\}\), determine the probability to reach the target class j, from class i and read:
$$\begin{aligned} q^*_{0j}= & {} \frac{1}{8},\quad \forall j\in \{0,\ldots ,7\} \end{aligned}$$
$$\begin{aligned} q^*_{1j}= & {} \frac{1}{4},\quad \forall j\in \{1,3,5,7\}\quad \text {and}\quad 0\; \text {otherwise}\end{aligned}$$
$$\begin{aligned} q^*_{2j}= & {} \frac{1}{4},\quad \forall j\in \{0,2,4,6\}\quad \text {and}\quad 0\; \text {otherwise}\end{aligned}$$
$$\begin{aligned} q^*_{3j}= & {} \frac{1}{2},\quad \forall j\in \{0,4\}\quad \text {and} \quad 0 \; \text {otherwise}\end{aligned}$$
$$\begin{aligned} q^*_{4j}= & {} \frac{1}{4},\quad \forall j\in \{0,2,4,6\}\quad \text {and \quad 0 otherwise}\end{aligned}$$
$$\begin{aligned} q^*_{5j}= & {} \frac{1}{4},\quad \forall j\in \{0,2,4,6\}\end{aligned}$$
$$\begin{aligned} q^*_{6j}= & {} \frac{1}{4},\quad \forall j\in \{1,3,5,7\}\quad \text {and} \quad 0\; \text {otherwise}\end{aligned}$$
$$\begin{aligned} q^*_{7j}= & {} \frac{1}{2},\quad \forall j\in \{2,6\}\quad \text { and} \quad 0 \;\text {otherwise} , \end{aligned}$$
or, equivalently, in matrix notation:
$$\begin{aligned} Q^*=\left( \begin{matrix} {\frac{1}{8}} &{}\quad {\frac{1}{8}} &{}\quad {\frac{1}{8}} &{}\quad {\frac{1}{8}} &{}\quad {\frac{1}{8}} &{}\quad {\frac{1}{8}} &{}\quad {\frac{1}{8}} &{}\quad {\frac{1}{8}}\\ 0 &{}\quad {\frac{1}{4}} &{}\quad 0 &{}\quad {\frac{1}{4}} &{}\quad 0 &{}\quad {\frac{1}{4}} &{}0 &{} {\frac{1}{4}}\\ {\frac{1}{4}} &{}\quad 0 &{}\quad {\frac{1}{4}} &{}\quad 0 &{}\quad {\frac{1}{4}} &{}\quad 0 &{}\quad {\frac{1}{4}} &{}\quad 0\\ {\frac{1}{2}} &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad {\frac{1}{2}} &{}\quad 0 &{}\quad 0 &{}\quad 0\\ {\frac{1}{4}} &{}\quad 0 &{}\quad {\frac{1}{4}} &{}\quad 0 &{}\quad {\frac{1}{4}} &{}\quad 0 &{}\quad {\frac{1}{4}} &{}\quad 0\\ {\frac{1}{4}} &{}\quad 0 &{}\quad {\frac{1}{4}} &{}\quad 0 &{}\quad {\frac{1}{4}} &{}\quad 0 &{}\quad {\frac{1}{4}} &{}\quad 0\\ 0 &{} {\frac{1}{4}} &{}\quad 0 &{}\quad {\frac{1}{4}} &{}\quad 0 &{}\quad {\frac{1}{4}} &{}\quad 0 &{}\quad {\frac{1}{4}}\\ 0 &{}\quad 0 &{}\quad {\frac{1}{2}} &{}\quad 0 &{}\quad 0&{}\quad 0 &{}\quad {\frac{1}{2}} &{}\quad 0\\ \end{matrix} \right) . \end{aligned}$$


Let us begin by proving that \(Q^*\) is a stochastic matrix, that is \(\sum _jq^*_{ij}=1\) for all \(i=0,\ldots ,7\). Recall that classes \(\mathscr {B}(l,64)\) are disjoint and their union defines the whole set of integer numbers. We hence get from Eq. (9)
$$\begin{aligned} \bigcup _{j=0}^7(\mathscr {B}(i,8)\cap S^{-1}\mathscr {B}(j,8))= & {} \mathscr {B}(i,8)\cap \left( \bigcup _{j=0}^7 S^{-1}\mathscr {B}(j,8)\right) =\mathscr {B}(i,8)\cap \left( \bigcup _{l=0}^{63} \mathscr {B}(l,64)\right) \nonumber \\= & {} \mathscr {B}(i,8)\cap \mathbb {N}=\mathscr {B}(i,8), \end{aligned}$$
and thus from Eq. (8):
$$\begin{aligned} \sum _{j=0}^7 q^*_{ij}= & {} \sum _{j=0}^7\frac{\mu _{inv}[\mathscr {B}(i,8)\cap S^{-1}\mathscr {B}(j,8)]}{\mu _{inv}[\mathscr {B}(i,8)]}\nonumber \\= & {} \frac{1}{\mu _{inv}[\mathscr {B}(i,8)]}\sum _{j=0}^7\mu _{inv}[\mathscr {B}(i,8)\cap S^{-1}\mathscr {B}(j,8)]\nonumber \\= & {} \frac{1}{\mu _{inv}[\mathscr {B}(i,8)]} \mu _{inv}[\mathscr {B}(i,8)]=1. \end{aligned}$$
The remaining part of the proof is straightforward. To compute the values of \(q^*_{ij}\) given by Eq. (8), one makes use of Proposition 2. Moreover, we recall that \(\mu _{inv}[\mathscr {B}(i,8)]=1/6\) if \(i=0,2,4,6\) and 1 / 12 if \(i=1,3,5,7\) and \(\mu _{inv}[\mathscr {B}(i,64)]=\mu _{inv}[\mathscr {B}(j,8)]/8\) if \(i\equiv j \mod 8\), as follows the invariant probability measure \(\mu _{inv}\).
Consider \(q^*_{0j}\):
$$\begin{aligned} q^*_{0j}= & {} \frac{\mu _{inv}[\mathscr {B}(0,8)\cap S^{-1}\mathscr {B}(j,8)]}{\mu _{inv}[\mathscr {B}(0,8)]}=\frac{1/6\times 1/8}{1/6}=\frac{1}{8}\quad (j=0,\ldots ,7). \end{aligned}$$
Let us now turn to considering \(q^*_{1j}\):
$$\begin{aligned} q^*_{10}= & {} \frac{\mu _{inv}[\mathscr {B}(1,8)\cap S^{-1}\mathscr {B}(0,8)]}{\mu _{inv}[\mathscr {B}(1,8)]}=0\nonumber \\ q^*_{11}= & {} \frac{\mu _{inv}[\mathscr {B}(1,8)\cap S^{-1}\mathscr {B}(1,8)]}{\mu _{inv}[\mathscr {B}(1,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(1,64)\cup \mathscr {B}(33,64)]}{\mu _{inv}[\mathscr {B}(1,8)]}=\frac{2\times 1/12\times 1/8}{1/12}=\frac{1}{4}\nonumber \\ q^*_{12}= & {} \frac{\mu _{inv}[\mathscr {B}(1,8)\cap S^{-1}\mathscr {B}(2,8)]}{\mu _{inv}[\mathscr {B}(1,8)]}=0\nonumber \\ q^*_{13}= & {} \frac{\mu _{inv}[\mathscr {B}(1,8)\cap S^{-1}\mathscr {B}(3,8)]}{\mu _{inv}[\mathscr {B}(1,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(25,64)\cup \mathscr {B}(57,64)]}{\mu _{inv}[\mathscr {B}(1,8)]}=\frac{2\times 1/12\times 1/8}{1/12}=\frac{1}{4}\nonumber \\ q^*_{14}= & {} \frac{\mu _{inv}[\mathscr {B}(1,8)\cap S^{-1}\mathscr {B}(4,8)]}{\mu _{inv}[\mathscr {B}(1,8)]}=0\nonumber \\ q^*_{15}= & {} \frac{\mu _{inv}[\mathscr {B}(1,8)\cap S^{-1}\mathscr {B}(5,8)]}{\mu _{inv}[\mathscr {B}(1,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(17,64)\cup \mathscr {B}(49,64)]}{\mu _{inv}[\mathscr {B}(1,8)]}=\frac{2\times 1/12\times 1/8}{1/12}=\frac{1}{4}\nonumber \\ q^*_{16}= & {} \frac{\mu _{inv}[\mathscr {B}(1,8)\cap S^{-1}\mathscr {B}(6,8)]}{\mu _{inv}[\mathscr {B}(1,8)]}=0\nonumber \\ q^*_{17}= & {} \frac{\mu _{inv}[\mathscr {B}(1,8)\cap S^{-1}\mathscr {B}(7,8)]}{\mu _{inv}[\mathscr {B}(1,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(9,64)\cup \mathscr {B}(41,64)]}{\mu _{inv}[\mathscr {B}(1,8)]}=\frac{2\times 1/12\times 1/8}{1/12}=\frac{1}{4}. \end{aligned}$$
Similarly for \(q^*_{2j}\) we have:
$$\begin{aligned} q^*_{20}= & {} \frac{\mu _{inv}[ \mathscr {B}(2,8)\cap S^{-1} \mathscr {B}(0,8)]}{\mu _{inv}[ \mathscr {B}(2,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(10,64)\cup \mathscr {B}(42,64)]}{\mu _{inv}[ \mathscr {B}(2,8)]}=\frac{2\times 1/6\times 1/8}{1/6}=\frac{1}{4}\nonumber \\ q^*_{21}= & {} \frac{\mu _{inv}[ \mathscr {B}(2,8)\cap S^{-1} \mathscr {B}(1,8)]}{\mu _{inv}[ \mathscr {B}(2,8)]}=0\nonumber \\ q^*_{22}= & {} \frac{\mu _{inv}[ \mathscr {B}(2,8)\cap S^{-1} \mathscr {B}(2,8)]}{\mu _{inv}[ \mathscr {B}(2,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B} (2,64)\cup \mathscr {B}(34,64)]}{\mu _{inv}[ \mathscr {B}(2,8)]}=\frac{2\times 1/6\times 1/8}{1/6}=\frac{1}{4}\nonumber \\ q^*_{23}= & {} \frac{\mu _{inv}[ \mathscr {B}(2,8)\cap S^{-1} \mathscr {B}(3,8)]}{\mu _{inv}[ \mathscr {B}(2,8)]}=0\nonumber \\ q^*_{24}= & {} \frac{\mu _{inv}[ \mathscr {B}(2,8)\cap S^{-1} \mathscr {B}(4,8)]}{\mu _{inv}[ \mathscr {B}(2,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(26,64)\cup \mathscr {B}(58,64)]}{\mu _{inv}[ \mathscr {B}(2,8)]}=\frac{2\times 1/6\times 1/8}{1/6}=\frac{1}{4}\nonumber \\ q^*_{25}= & {} \frac{\mu _{inv}[ \mathscr {B}(2,8)\cap S^{-1} \mathscr {B}(5,8)]}{\mu _{inv}[ \mathscr {B}(2,8)]}=0\nonumber \\ q^*_{26}= & {} \frac{\mu _{inv}[ \mathscr {B}(2,8)\cap S^{-1} \mathscr {B}(6,8)]}{\mu _{inv}[ \mathscr {B}(2,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(18,64)\cup \mathscr {B}(50,64)]}{\mu _{inv}[ \mathscr {B}(2,8)]}=\frac{2\times 1/6\times 1/8}{1/6}=\frac{1}{4}\nonumber \\ q^*_{27}= & {} \frac{\mu _{inv}[ \mathscr {B}(2,8)\cap S^{-1} \mathscr {B}(7,8)]}{\mu _{inv}[ \mathscr {B}(2,8)]}=0. \end{aligned}$$
Then for \(q^*_{3j}\) we get:
$$\begin{aligned} q^*_{30}= & {} \frac{\mu _{inv}[ \mathscr {B}(3,8)\cap S^{-1} \mathscr {B}(0,8)]}{\mu _{inv}[ \mathscr {B}(3,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(3,64)\cup \mathscr {B}(19,64)\cup \mathscr {B}(35,64)\cup \mathscr {B}(56,64)]}{\mu _{inv}[ \mathscr {B}(3,8)]}=\frac{4\times 1/12\times 1/8}{1/12}=\frac{1}{2}\nonumber \\ q^*_{31}= & {} \frac{\mu _{inv}[ \mathscr {B}(3,8)\cap S^{-1} \mathscr {B}(1,8)]}{\mu _{inv}[ \mathscr {B}(3,8)]}=0\nonumber \\ q^*_{32}= & {} \frac{\mu _{inv}[ \mathscr {B}(3,8)\cap S^{-1} \mathscr {B}(2,8)]}{\mu _{inv}[ \mathscr {B}(3,8)]}=0\nonumber \\ q^*_{33}= & {} \frac{\mu _{inv}[ \mathscr {B}(3,8)\cap S^{-1} \mathscr {B}(3,8)]}{\mu _{inv}[ \mathscr {B}(3,8)]}=0\nonumber \\ q^*_{34}= & {} \frac{\mu _{inv}[ \mathscr {B}(3,8)\cap S^{-1} \mathscr {B}(4,8)]}{\mu _{inv}[ \mathscr {B}(3,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(11,64)\cup \mathscr {B}(27,64)\cup \mathscr {B}(43,64)\cup \mathscr {B}(59,64)]}{\mu _{inv}[ \mathscr {B}(3,8)]}=\frac{4\times 1/12\times 1/8}{1/12}=\frac{1}{2}\nonumber \\ q^*_{35}= & {} \frac{\mu _{inv}[ \mathscr {B}(3,8)\cap S^{-1} \mathscr {B}(5,8)]}{\mu _{inv}[ \mathscr {B}(3,8)]}=0\nonumber \\ q^*_{36}= & {} \frac{\mu _{inv}[ \mathscr {B}(3,8)\cap S^{-1} \mathscr {B}(6,8)]}{\mu _{inv}[ \mathscr {B}(3,8)]}=0\nonumber \\ q^*_{37}= & {} \frac{\mu _{inv}[ \mathscr {B}(3,8)\cap S^{-1} \mathscr {B}(7,8)]}{\mu _{inv}[ \mathscr {B}(3,8)]}=0, \end{aligned}$$
For \(q^*_{4j}\):
$$\begin{aligned} q^*_{40}= & {} \frac{\mu _{inv}[ \mathscr {B}(4,8)\cap S^{-1} \mathscr {B}(0,8)]}{\mu _{inv}[ \mathscr {B}(4,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(52,64)\cup \mathscr {B}(20,64)]}{\mu _{inv}[ \mathscr {B}(4,8)]}=\frac{2\times 1/6\times 1/8}{1/6}=\frac{1}{4}\nonumber \\ q^*_{41}= & {} \frac{\mu _{inv}[ \mathscr {B}(4,8)\cap S^{-1} \mathscr {B}(1,8)]}{\mu _{inv}[ \mathscr {B}(4,8)]}=0\nonumber \\ q^*_{42}= & {} \frac{\mu _{inv}[ \mathscr {B}(4,8)\cap S^{-1} \mathscr {B}(2,8)]}{\mu _{inv}[ \mathscr {B}(4,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(12,64)\cup \mathscr {B}(44,64)]}{\mu _{inv}[ \mathscr {B}(4,8)]}=\frac{2\times 1/6\times 1/8}{1/6}=\frac{1}{4}\nonumber \\ q^*_{43}= & {} \frac{\mu _{inv}[ \mathscr {B}(4,8)\cap S^{-1} \mathscr {B}(3,8)]}{\mu _{inv}[ \mathscr {B}(4,8)]}=0\nonumber \\ q^*_{44}= & {} \frac{\mu _{inv}[ \mathscr {B}(4,8)\cap S^{-1} \mathscr {B}(4,8)]}{\mu _{inv}[ \mathscr {B}(4,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B} (4,64)\cup \mathscr {B}(36,64)]}{\mu _{inv}[ \mathscr {B}(4,8)]}=\frac{2\times 1/6\times 1/8}{1/6}=\frac{1}{4}\nonumber \\ q^*_{45}= & {} \frac{\mu _{inv}[ \mathscr {B}(4,8)\cap S^{-1} \mathscr {B}(5,8)]}{\mu _{inv}[ \mathscr {B}(4,8)]}=0\nonumber \\ q^*_{46}= & {} \frac{\mu _{inv}[ \mathscr {B}(4,8)\cap S^{-1} \mathscr {B}(6,8)]}{\mu _{inv}[ \mathscr {B}(4,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(28,64)\cup \mathscr {B}(60,64)]}{\mu _{inv}[ \mathscr {B}(4,8)]}=\frac{2\times 1/6\times 1/8}{1/6}=\frac{1}{4}\nonumber \\ q^*_{47}= & {} \frac{\mu _{inv}[ \mathscr {B}(4,8)\cap S^{-1} \mathscr {B}(7,8)]}{\mu _{inv}[ \mathscr {B}(4,8)]}=0. \end{aligned}$$
For \(q^*_{5j}\):
$$\begin{aligned} q^*_{50}= & {} \frac{\mu _{inv}[ \mathscr {B}(5,8)\cap S^{-1} \mathscr {B}(0,8)]}{\mu _{inv}[ \mathscr {B}(5,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(21,64)\cup \mathscr {B}(53,64)]}{\mu _{inv}[ \mathscr {B}(5,8)]}=\frac{2\times 1/12\times 1/8}{1/12}=\frac{1}{4}\nonumber \\ q^*_{51}= & {} \frac{\mu _{inv}[ \mathscr {B}(5,8)\cap S^{-1} \mathscr {B}(1,8)]}{\mu _{inv}[ \mathscr {B}(5,8)]}=0\nonumber \\ q^*_{52}= & {} \frac{\mu _{inv}[ \mathscr {B}(5,8)\cap S^{-1} \mathscr {B}(2,8)]}{\mu _{inv}[ \mathscr {B}(5,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(13,64)\cup \mathscr {B}(45,64)]}{\mu _{inv}[ \mathscr {B}(5,8)]}=\frac{2\times 1/12\times 1/8}{1/12}=\frac{1}{4}\nonumber \\ q^*_{53}= & {} \frac{\mu _{inv}[ \mathscr {B}(5,8)\cap S^{-1} \mathscr {B}(3,8)]}{\mu _{inv}[ \mathscr {B}(5,8)]}=0\nonumber \\ q^*_{54}= & {} \frac{\mu _{inv}[ \mathscr {B}(5,8)\cap S^{-1} \mathscr {B}(4,8)]}{\mu _{inv}[ \mathscr {B}(5,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B} (5,64)\cup \mathscr {B}(37,64)]}{\mu _{inv}[ \mathscr {B}(5,8)]}=\frac{2\times 1/12\times 1/8}{1/12}=\frac{1}{4}\nonumber \\ q^*_{55}= & {} \frac{\mu _{inv}[ \mathscr {B}(5,8)\cap S^{-1} \mathscr {B}(5,8)]}{\mu _{inv}[ \mathscr {B}(5,8)]}=0\nonumber \\ q^*_{56}= & {} \frac{\mu _{inv}[ \mathscr {B}(5,8)\cap S^{-1} \mathscr {B}(6,8)]}{\mu _{inv}[ \mathscr {B}(5,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(29,64)\cup \mathscr {B}(61,64)]}{\mu _{inv}[ \mathscr {B}(5,8)]}=\frac{2\times 1/12\times 1/8}{1/12}=\frac{1}{4}\nonumber \\ q^*_{57}= & {} \frac{\mu _{inv}[ \mathscr {B}(5,8)\cap S^{-1} \mathscr {B}(7,8)]}{\mu _{inv}[ \mathscr {B}(5,8)]}=0. \end{aligned}$$
For \(q^*_{6j}\):
$$\begin{aligned} q^*_{60}= & {} \frac{\mu _{inv}[\mathscr {B}(6,8)\cap S^{-1}\mathscr {B}(0,8)]}{\mu _{inv}[\mathscr {B}(6,8)]}=0\nonumber \\ q^*_{61}= & {} \frac{\mu _{inv}[\mathscr {B}(6,8)\cap S^{-1}\mathscr {B}(1,8)]}{\mu _{inv}[\mathscr {B}(6,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(22,64)\cup \mathscr {B}(54,64)]}{\mu _{inv}[\mathscr {B}(6,8)]}=\frac{2\times 1/6\times 1/8}{1/6}=\frac{1}{4}\nonumber \\ q^*_{62}= & {} \frac{\mu _{inv}[\mathscr {B}(6,8)\cap S^{-1}\mathscr {B}(2,8)]}{\mu _{inv}[\mathscr {B}(6,8)]}=0\nonumber \\ q^*_{63}= & {} \frac{\mu _{inv}[\mathscr {B}(6,8)\cap S^{-1}\mathscr {B}(3,8)]}{\mu _{inv}[\mathscr {B}(6,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(14,64)\cup \mathscr {B}(46,64)]}{\mu _{inv}[\mathscr {B}(6,8)]}=\frac{2\times 1/6\times 1/8}{1/6}=\frac{1}{4}\nonumber \\ q^*_{64}= & {} \frac{\mu _{inv}[\mathscr {B}(6,8)\cap S^{-1}\mathscr {B}(4,8)]}{\mu _{inv}[\mathscr {B}(6,8)]}=0\nonumber \\ q^*_{65}= & {} \frac{\mu _{inv}[\mathscr {B}(6,8)\cap S^{-1}\mathscr {B}(5,8)]}{\mu _{inv}[\mathscr {B}(6,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(6,64)\cup \mathscr {B}(38,64)]}{\mu _{inv}[\mathscr {B}(6,8)]}=\frac{2\times 1/6\times 1/8}{1/6}=\frac{1}{4}\nonumber \\ q^*_{66}= & {} \frac{\mu _{inv}[\mathscr {B}(6,8)\cap S^{-1}\mathscr {B}(6,8)]}{\mu _{inv}[\mathscr {B}(6,8)]}=0\nonumber \\ q^*_{67}= & {} \frac{\mu _{inv}[\mathscr {B}(6,8)\cap S^{-1}\mathscr {B}(7,8)]}{\mu _{inv}[\mathscr {B}(6,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(30,64)\cup \mathscr {B}(62,64)]}{\mu _{inv}[\mathscr {B}(6,8)]}=\frac{2\times 1/6\times 1/8}{1/6}=\frac{1}{4}. \end{aligned}$$
And, finally, for \(q^*_{7j}\):
$$\begin{aligned} q^*_{70}= & {} \frac{\mu _{inv}[ \mathscr {B}(7,8)\cap S^{-1} \mathscr {B}(0,8)]}{\mu _{inv}[ \mathscr {B}(7,8)]}=0\nonumber \\ q^*_{71}= & {} \frac{\mu _{inv}[ \mathscr {B}(7,8)\cap S^{-1} \mathscr {B}(1,8)]}{\mu _{inv}[ \mathscr {B}(7,8)]}=0\nonumber \\ q^*_{72}= & {} \frac{\mu _{inv}[ \mathscr {B}(7,8)\cap S^{-1} \mathscr {B}(2,8)]}{\mu _{inv}[ \mathscr {B}(7,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(7,64)\cup \mathscr {B}(23,64)\cup \mathscr {B}(39,64)\cup \mathscr {B}(55,64)]}{\mu _{inv}[ \mathscr {B}(7,8)]}=\frac{4\times 1/12\times 1/8}{1/12}=\frac{1}{2}\nonumber \\ q^*_{73}= & {} \frac{\mu _{inv}[ \mathscr {B}(7,8)\cap S^{-1} \mathscr {B}(3,8)]}{\mu _{inv}[ \mathscr {B}(7,8)]}=0\nonumber \\ q^*_{74}= & {} \frac{\mu _{inv}[ \mathscr {B}(7,8)\cap S^{-1} \mathscr {B}(4,8)]}{\mu _{inv}[ \mathscr {B}(7,8)]}=0\nonumber \\ q^*_{75}= & {} \frac{\mu _{inv}[ \mathscr {B}(7,8)\cap S^{-1} \mathscr {B}(5,8)]}{\mu _{inv}[ \mathscr {B}(7,8)]}=0\nonumber \\ q^*_{76}= & {} \frac{\mu _{inv}[ \mathscr {B}(7,8)\cap S^{-1} \mathscr {B}(6,8)]}{\mu _{inv}[ \mathscr {B}(7,8)]}\nonumber \\= & {} \frac{\mu _{inv}[\mathscr {B}(15,64)\cup \mathscr {B}(31,64)\cup \mathscr {B}(47,64)\cup \mathscr {B}(63,64)]}{\mu _{inv}[ \mathscr {B}(7,8)]}=\frac{4\times 1/12\times 1/8}{1/12}=\frac{1}{2}\nonumber \\ q^*_{77}= & {} \frac{\mu _{inv}[ \mathscr {B}(7,8)\cap S^{-1} \mathscr {B}(7,8)]}{\mu _{inv}[ \mathscr {B}(7,8)]}=0, \end{aligned}$$
The finite state Markov process is graphically represented in Fig. 1. All possible moves are shown: the associated transition probabilities set the widths of the arcs connecting two adjacent nodes, the thicker the arc, the larger the transition probability.
Fig. 1

A schematic layout of the network that defines the allowed moves for the finite state Markov chain. The colour of the links points to the departing node while the arrows indicate the direction of the flux. The widths of the connecting arcs reflect the associated transition weights, as specified in matrix (27) (colour figure online)

3 Collatz orbits are on average contracting

Building on the above, we can now prove that, on average, Collatz orbits are bounded. We begin by computing the invariant (stationary) distribution \(\mathbf {P}_{stat}\) of the Markov process defined by the irreducible transition matrix \(Q^*\). The stationary distribution \(\mathbf {P}_{stat}\) represents a factual constraint that should be matched on average by stochastic trajectories. It will allow in turn to draw conclusions on the late time behaviour of the deterministic orbits (prior possible convergence to the Collatz cycle). We recall in fact that the transition probabilities of the Markov chain are computed by using a measure invariant (over the congruences classes of interest) under the deterministic map S. This allows us to study the asymptotic dynamics of the deterministic process via successive applications of the Markov chain (see Proposition 4).

Determining the stationary distribution \(\mathbf {P}_{stat}\) amounts to solving the following eigenvalue problem:
$$\begin{aligned} \mathbf {P}_{stat} Q^* = \mathbf {P}_{stat} \end{aligned}$$
In practical terms, the stationary distribution corresponds to the left eigenvector associated to the dominant eigenvalue (\(\lambda =1\), due to the stochasticity constraint) of the transition matrix \(Q^*\). Performing the above calculation immediately returns:
$$\begin{aligned} \mathbf {P}_{stat}=(1/6,1/12,1/6,1/12,1/6,1/12,1/6,1/12) \end{aligned}$$
To proceed in the analysis we prove the following Proposition:

Proposition 6

The finite states Markov process defined by the transition matrix \(Q^*\) is irreducible and recurrent, hence ergodic.


The claim follows by direct inspection of \((Q^*)^2\), the second iterate of \(Q^*\). In fact, \([(Q^*)^2]_{ij}\ge 1/16\) for all i and j in \(\{0,\ldots ,7\}\).

Recall now that, based on Proposition 4, one can elaborate on the late time dynamics of S (or equivalently, its restriction in mod8) by virtue of the introduced Markov chain. This observation translates in a viable strategy to prove the average contracting property of the deterministic map S. Roughly speaking when the map S is acted on a positive integer belonging to class \(\mathscr {B}(0,8)\), it drives a contraction factor equal to 1 / 8. If S is instead operated on natural numbers of the classes \(\mathscr {B}(j,8)\), \(j=1,2,4,5,6\), it results in a contraction of 3 / 4. At variance, S(n) produces an expansion with rate 9 / 2, for n belonging to classes \(\mathscr {B}(3,8)\) and \(\mathscr {B}(7,8)\). Hence the obtained information on \(\mathbf {P}_{stat}\) allows to estimate the degree of contraction (or expansion) \(f_{Q^*}\) that trajectories should, on average, produce:
$$\begin{aligned} f_{Q^*} \simeq \left( \frac{1}{8}\right) ^{\omega _0}\left( \frac{3}{4}\right) ^{\omega _1} \left( \frac{9}{2}\right) ^{\omega _2}, \end{aligned}$$
where \(\omega _0=1/6\) is the probability of being in the congruence class \(j=0\), \(\omega _1=3/6+2/12=2/3\) the probability of being in the congruence classes \(j=1,2,4,5,6\) and \(\omega _2=2/12=1/6\) the probability of being in the congruence classes \(j=3,7\). Probabilities \(\omega _0,\)\(\omega _1\), \(\omega _2\) follow Eq. (29), while the contraction/expansion factors \(\frac{1}{8}\), \(\frac{3}{4}\) and \(\frac{9}{2}\) associated to each of congruence class are made explicit in Eq. (3).

Carrying out the calculation yields \(f_{Q^*}= 3/4<1\), thus implying in turn that the average approach to the absorbing equilibrium is contracting. Observe that this latter contracting factor is here analytically determined, at variance with previous attempts that relied on heuristic reasoning. Notice that this preliminary estimate \(f_{Q^*}= 3/4\) has been obtained by just retaining the terms proportional to n in the definition of S(n), see Eq. (3), or, equivalently, working with a sufficiently large n. Accounting for the constant (n independent) contributions in (3) does not modify the conclusion that we have reached: the generic orbit is always contracting, as it is proved hereafter.

Consider in fact Eq. (3) which define the map S on the classes \(\mathscr {B}(i,8)\). We are in particular interested in the contraction/expansion factors associated to each transition among classes. The following upper bounds can be obtained:
$$\begin{aligned} \forall n\in \mathbb {N}\quad S(n)= {\left\{ \begin{array}{ll} \frac{n}{8}=:c_0(n_{min}) n&{}\quad \text {if}\, n\in \mathscr {B}(0,8)\\ \frac{3n+1}{4}< \frac{3}{4}\left( 1+\frac{1}{3n_{min}}\right) n =:c_1(n_{min}) n&{}\quad \text {if} \, n\in \mathscr {B}(1,8)\\ \frac{3n+2}{4}< \frac{3}{4}\left( 1+\frac{2}{3n_{min}}\right) n=:c_2(n_{min}) n &{}\quad \text {if}\, n\in \mathscr {B}(2,8)\\ \frac{9n+5}{2}< \frac{9}{2}\left( 1+\frac{5}{9n_{min}}\right) n=:c_3(n_{min}) n&{}\quad \text {if} \, n\in \mathscr {B}(3,8)\\ \frac{3n+4}{4}< \frac{3}{4}\left( 1+\frac{4}{3n_{min}}\right) n=:c_4(n_{min}) n&{}\quad \text {if}\, n\in \mathscr {B}(4,8)\\ \frac{3n+1}{4}< \frac{3}{4}\left( 1+\frac{1}{3 n_{min}}\right) n=:c_5(n_{min}) n&{}\quad \text {if}\, n\in \mathscr {B}(5,8)\\ \frac{3n+2}{4}< \frac{3}{4}\left( 1+\frac{2}{3n_{min}}\right) n=:c_6(n_{min}) n&{}\quad \text {if}\, n\in \mathscr {B}(6,8)\\ \frac{9n+5}{2}< \frac{9}{2}\left( 1+\frac{5}{9n_{min}}\right) n=:c_7(n_{min}) n&{}\quad \text {if}\, n\in \mathscr {B}(7,8), \end{array}\right. } \end{aligned}$$
where the constants \(c_i(n_{min})\) are defined by the rightmost hand sides of the previous equations. Here \(n_{min}\) stands for the smallest integer visited by the system in its quasi-stationary state (i.e. before it eventually hits the absorbing Collatz cycle, if this is the case). Since \(\{1,2,4\}\) belong to the Collatz cycle, and because we are solely focusing on the dynamics that precedes the possible convergence to the Collatz cycle, we will set3\(n_{min}=3\). With this choice, one gets:
$$\begin{aligned}&c_0=\frac{1}{8},\quad c_1=\frac{3}{4}\frac{10}{9},\quad c_2=\frac{3}{4}\frac{11}{9},\quad c_3=\frac{9}{2}\frac{32}{27},\quad c_4=\frac{3}{4}\frac{13}{9},\quad c_5=\frac{3}{4}\frac{10}{9}, \nonumber \\&c_6=\frac{3}{4}\frac{11}{9}\quad \text { and}\quad c_7=\frac{9}{2}\frac{32}{27}, \end{aligned}$$
where the explicit reference to \(n_{min}=3\) has been dropped in the definition of the symbols \(c_i\), \(i=0,\ldots ,7\).
We have therefore:
$$\begin{aligned} f_{Q^*} \le c_0^{{\varOmega }_0} c_1^{{\varOmega }_1} c_2^{{\varOmega }_2} c_3^{{\varOmega }_3}c_4^{{\varOmega }_4}, \end{aligned}$$
where \({\varOmega }_0=\omega _0=1/6\) is the probability of being in the congruence class \(j=0\), \({\varOmega }_1=1/6\) the probability of being in the congruence classes \(j=1,5\), \({\varOmega }_2=1/3\) the probability of being in the congruence classes \(j=2,6\), \({\varOmega }_3=1/6\) the probability of being in the congruence classes \(j=3,7\) and \({\varOmega }_4=1/6\) the probability of being in the congruence class \(j=4\). Performing the calculation yields \(f_{Q^*} \le 0.8926\). The dynamics of S is therefore contracting and trajectories are on average attracted towards the three fixed points as identified above, namely the entries of the Collatz cycle \(\{1,2,4\}\).
The previous observation can be also made rigorous at the level of single trajectories [5]. Consider in fact a generic Markov chain with state space X and transition matrix Q. Assume the Markov chain to be irreducible and positive recurrent. Let \(\pi \) denote the unique invariant probability measure and consider a non-negative function \(f : X \mapsto \mathbb {R}\), summable with respect to \(\pi \). Then for a.e \(n_0 \in X\), we recall that
$$\begin{aligned} \lim _{k\rightarrow \infty } \frac{1}{k}\sum _{j=0}^{k-1}f(S^{\circ j}(n_0)) =\sum f(x) \pi (x). \end{aligned}$$
Taking \(f=\delta _i\), \(i\in \{0,\ldots ,7\}\), the latter result implies that the entries of the stationary distribution represent the fraction of time spent by the Markov chain in each of the eight classes (see Appendix A), more precisely for almost every orbit of the stochastic process, \(X_j\), we have for all \(i\in \{0,\ldots ,7\}\):
$$\begin{aligned} \lim _{k\rightarrow \infty }\frac{\#\{0\le j \le k-1: X_j=i\}}{k}=\lim _{k\rightarrow \infty }\frac{1}{k}\sum _{j=0}^{k-1}\delta _{X_j=i}= (P_{stat})_i. \end{aligned}$$
Equipped with the above one can prove that almost all orbits generated by S are bound to contract and hence converge to the Collatz cycle. To this end let us assume the existence of an initial datum \(n_0 \in \mathbb {N}\) associated to a diverging Collatz orbit, \(n_k=S^{\circ k}(n_0)\), \(k\ge 0\). From definition (32) it obviously follows \(S(n)<c_i n\) for \(n\in \mathscr {B}(i,8)\) and thus
$$\begin{aligned} n_k=S(n_{k-1})<c_{i_{k-1}}n_{k-1}=c_{i_{k-1}}S(n_{k-2})<c_{i_{k-1}} c_{i_{k-2}}n_{k-2}<\cdots < c_{i_{k-1}}\ldots c_{i_{0}}n_0. \end{aligned}$$
We now proceed by defining the quantity \(Y_k=\log c_{i_k}\). As the trajectory stemming from \(n_0\) is assumed by hypothesis diverging, and hence constituted by an infinite number of entries, one gets:
$$\begin{aligned} \lim _{k\rightarrow \infty }\frac{1}{k}\sum _{j=0}^{k-1}Y_l= & {} \frac{\log c_0}{6}+\frac{\log c_1}{12}+\frac{\log c_2}{6}+\frac{\log c_3}{12}+\frac{\log c_4}{6}+\frac{\log c_5}{12}+\frac{\log c_6}{6}+\frac{\log c_7}{12}\nonumber \\\sim & {} -0.1136=\alpha , \end{aligned}$$
where the factors 1 / 6 and 1 / 12 that weights \(c_i\) follows from the stationary distribution \(\mathbf {P}_{stat}\) as computed above.
It is therefore always possible to choose an integer \(k(n_0)>0\) such that, for all \(k>k(n_0)\) one, has
$$\begin{aligned} \sum _{j=0}^{k-1}Y_l \le \frac{\alpha }{2}k, \end{aligned}$$
Notice that in the above bound the factor 1 / 2 is arbitrary and does not bear any degree of specificity. It hence follows:
$$\begin{aligned} c_{i_{k-1}}\ldots c_{i_{0}} =e^{\sum _{j=0}^{k-1}Y_l}\le \beta ^k \quad \forall k>k(n_0), \end{aligned}$$
where \(\beta =e^{\alpha /2}\sim 0.944\).
Summing up, under the assumption of the existence of a diverging orbit we have proved that, for a sufficiently large stopping time [1] k, one has:
$$\begin{aligned} S^{\circ k}(n_0)=n_k \le \beta ^k n_0. \end{aligned}$$
Since \(\beta <1\), this contradicts the assumption of dealing with a diverging orbit. The above result holds for \(\mu _{inv}\)—almost every initial conditions \(n_0\). In other words, for \(\mu _{inv}\)—almost every initial conditions \(n_0\) no diverging orbits can exist.

The above results follow a dynamical constraint on the equilibrium of S as obtained by partitioning the integers in 8 congruence classes. The visiting frequency are in fact obtained by computing the stationary distribution of a Markov analogue of the deterministic dynamics which runs on a finite alphabet of 8 states. What is going to happen if the analysis is progressively refined to the smaller scales, by working on the classes \(\mathscr {B}(i,8^m)\), for any given choice of m? We shall adapt the Markov approach to account for this generalisation and prove that orbits are contracting, at any given degree resolution, i.e. when sampling the equilibrium on the equivalence classes \(\mathscr {B}(i,8^m)\) for m large as sought. Remarkably, the computed upper bound for the contraction factor is independent on m and equal to the value obtained in Eq. (33).

4 The Collatz dynamics is contracting at the finest scale

In this section we will prove that Collatz orbits are on average contracting when seen on the equivalence classes \(\mathscr {B}(i,8^m)\), \(i\in \{0,\ldots , 8^m-1\}\) for any \(m>1\). Moreover, the computed upper bound for the contracting factors is identical to that obtained above, when operating with classes \(\mathscr {B}(i,8)\), see Eq. (33), and thus independent of m.

To this aim, the first step is to compute the transition probabilities \(q_{ij}(m)\). This latter quantifies the probability of reaching class \(\mathscr {B}(j,8^m)\) when starting from class \(\mathscr {B}(i,8^m)\), that is the conditional probability \(P[S(x)\in \mathscr {B}(j,8^m)| x\in \mathscr {B}(i,8^m)]\). In formulae:
$$\begin{aligned} q_{ij}(m):=\frac{\mu _{inv}[\mathscr {B}(i,8^m)\cap S^{-1}\mathscr {B}(j,8^m)]}{\mu _{inv}[\mathscr {B}(i,8^m)]}\quad (i,j=0,\ldots ,8^m-1). \end{aligned}$$
Label Q(m) the \(8^m \times 8^m\) matrix formed by the entries (37), for any given m. We can then prove the following proposition:

Proposition 7

The matrices Q(m) are stochastic for all \(m\ge 1\). Moreover the unique stationary distribution \(\mathbf {P}_{stat}(m)\), solution of
$$\begin{aligned} \mathbf {P}_{stat}(m)Q(m)=\mathbf {P}_{stat}(m), \end{aligned}$$
is the vector of \(\mathbb {R}^{8^m}\) given by
$$\begin{aligned} \mathbf {P}_{stat}(m)=(a,b,\ldots ,a,b), \end{aligned}$$
where \(a=1/6\times 1/8^{m-1}\) and \(b=1/12\times 1/8^{m-1}\).


The first claim can be proved by observing that that \(S^{-1}\mathscr {B}(j,8^m)\) is made by the union of disjoint classes \(\mathscr {B}(l_j,8^{m+1})\), where the set of indexes \(l_j\) depends on the initial class \(\mathscr {B}(j,8^m)\). Hence
$$\begin{aligned} \bigcup _{j=0}^{8^m-1}(\mathscr {B}(i,8^m)\cap S^{-1}\mathscr {B}(j,8^m))= & {} \mathscr {B}(i,8^m)\cap \left( \bigcup _{j=0}^{8^m-1} S^{-1}\mathscr {B}(j,8^m)\right) \\= & {} \mathscr {B}(i,8^m)\cap \left( \bigcup _{l=0}^{{8^{m+1}-1}} \mathscr {B}(l,{8^{m+1}})\right) \\= & {} \mathscr {B}(i,8^m)\cap \mathbb {N}=\mathscr {B}(i,8^m), \end{aligned}$$
and thus from Eq. (37):
$$\begin{aligned} \sum _{j=0}^{8^m-1} q_{ij}(m)= & {} \sum _{j=0}^{8^m-1}\frac{\mu _{inv}[\mathscr {B}(i,8^m)\cap S^{-1}\mathscr {B}(j,8^m)]}{\mu _{inv}[\mathscr {B}(i,8^m)]}\nonumber \\= & {} \frac{1}{\mu _{inv}[\mathscr {B}(i,8^m)]}\mu _{inv}[\mathscr {B}(i,8^m)]=1. \end{aligned}$$
Let us now prove that \(\mathbf {P}_{stat}(m)=(a,b,\ldots ,a,b)\), \(a=1/6\times 1/8^{m-1}\) and \(b=1/12\times 1/8^{m-1}\), is the (unique) eigenvector associated to the eigenvalue equal to unit (notice that this latter exists because Q(m) is stochastic). Let us start by computing
$$\begin{aligned} \sum _i (\mathbf {P}_{stat}(m))_iq_{ij}(m)=a\sum _{i\, \text {even}} q_{ij}(m)+b\sum _{i\, \text {odd}} q_{ij}(m). \end{aligned}$$
Observe then that, apart from the normalising factor \(1/8^{m-1}\), \(q_{ij}(m)\) is given by the number of distinct solutions of the linear congruence equations \(n\equiv i\) mod\(8^m\) and \(S(n)\equiv j\) mod\(8^m\), namely \(N_{ij}:=\#\{ \mathscr {B}(i,8^m)\cap S^{-1}\mathscr {B}(j,8^m)\}\). This allows us to rewrite Eq. (38) as follows
$$\begin{aligned} (\mathbf {P}_{stat}(m)Q(m))_j =\frac{a}{8} \sum _{i\, \text {even}} N_{ij}+\frac{b}{8} \sum _{i\, \text {odd}} N_{ij}, \end{aligned}$$
where used has been made of the definition of the measure \(\mu _{inv}^{(m)}\).
By recalling Proposition 3 we can now write
$$\begin{aligned} \mathscr {B}(i,8^m)\cap S^{-1}\mathscr {B}(j,8^m)=\mathscr {B}(i,8^m)\cap \big (A^{(m)}_e(j)\cup A^{(m)}_o(j)\big ), \end{aligned}$$
where \(A_e(j)\) is the union of disjoint classes \(\mathscr {B}(l,8^{m+1})\) with l even and \(A^{(m)}_o(j)\) is the union of disjoint classes \(\mathscr {B}(l,8^{m+1})\) with l odd.
By invoking again Proposition 3 we can write
$$\begin{aligned} \sum _{i\, \text {even}} N_{ij}= {\left\{ \begin{array}{ll} 5&{}\quad \text {if}\, j\, {\text {is\quad even}}\\ 3&{}\quad \text {if}\, j\, {\text {is\quad odd}} \end{array}\right. } \end{aligned}$$
$$\begin{aligned} \sum _{i\, \text {odd}} N_{ij}= {\left\{ \begin{array}{ll} 6&{}\quad \text {if}\, j\, \text {is\quad even}\\ 2&{}\quad \text {if}\, j\, \text {is\quad odd} \end{array}\right. } \end{aligned}$$
Hence, in conclusion
$$\begin{aligned} {\left\{ \begin{array}{ll} \frac{5}{8}a+\frac{6}{8}b=a &{}\quad \text {if}\, j\, \text {is\quad even}\\ \frac{3}{8}a+\frac{2}{8}b=b &{}\quad \text {if}\, j\, \text {is\quad odd} \end{array}\right. } \end{aligned}$$
which returns \(a=1/6\) and \(b=1/12\) as the sole non trivial solution. Observe that the factor \(1/8^{m-1}\) is needed to normalise the 1-norm of the vector to 1. Finally let us observe that this latter conclusion generalises to the non relatively prime case under study the results obtained by [3, 4] for relatively prime settings.

Remark 5

As shown above, the Markov process characterised by the stochastic matrix Q(m) admits a unique stationary state, with all non trivial entries. Then it is immediate to show that it is also irreducible and recurrent. In fact for arbitrarily large k, all rows of \(Q(m)^k\) are identical to the stationary distribution \(\mathbf {P}_{stat}(m)\). Its components are therefore strictly positive.

We are now in a position to quantify the degree of contraction/expansion that characterises the deterministic dynamics, as seen on the classes \(\mathscr {B}(i,8^m)\), \(m\ge 1\) and \(i=0,\ldots ,8^m-1\). This enables us to generalise the above analysis, beyond the rather specific choice \(m=1\). To this end let us define \(i^{(m)}(n)=i\) if \(n\in \mathscr {B}(i,8^m)\), that is the indicator function of the classes \(\mathscr {B}(i,8^m)\). Observe that the contraction/expansion S is ultimately determined by \(i^{(1)}(n)\) (see Eq. (31)). To prove that contracting character of the map we proceed as above, and assume that the orbit stemming from \(n_0\) is diverging, hence by definition made by an infinite set of entries.

Introduce then the function \(W(n)=\log c_{i^{(m)}(n)}\). Since the time spent by the system in each of the available classes is determined by the components of eigenvector with eigenvalue equal to one, for a.e. \(n_0\) we have
$$\begin{aligned} \lim _{k\rightarrow \infty }\frac{1}{k}\sum _{j=0}^{k-1}W(S^{\circ j}(n_0))= \left[ \frac{\sum _{k\,\in \{0,2,4,6\}}\log c_k}{6\; 8^{m-1}}+\frac{\sum _{k\, \in \{1,3,5,7\}}\log c_k}{12\; 8^{m-1}} \right] 8^{m-1}=\alpha , \end{aligned}$$
where \(1/6 \times 1/8^{m-1}\) and \(1/12 \times 1/8^{m-1}\) are the entries of the stationary invariant distribution \(\mathbf {P}_{stat}(m)\) (i.e. the time spent in each of the \(8^{m}\) classes) and the overall factor \(8^{m-1}\) is due to the fact that for any fixed \(i^{(1)}=i\) for \(i\in \{0,\ldots ,7\}\), there are \(8^{m-1}\) possibilities to have \(i^{(m)}=i\) mod\(8^m\). In conclusion the map S is on average contracting on the classes \(\mathscr {B}(i,8^m)\). As an additional, important remark, we notice that the upper bound for the average contracting factor as obtained when representing the dynamics of S on the \(8^{m}\) equivalent classes is identical to that given by Eq. (33).

The remarkable conclusion is therefore that the third iterate of the Collatz map is always contracting, when seen on the equivalence classes \(\mathscr {B}(i,8^m)\), for m large as sought, and that the estimated bound for the contraction factor is independent on the classes index m. In other words, we can make the number of classes as large as wished (and consequently reduce their size so to approach the singletons with arbitrary accuracy), while still detecting a contracting deterministic dynamics, with a constant (independent on m.) bound for the rate of contraction. As previously remarked when the limit for m that goes to infinity is performed, the measure of the classes, and hence the singletons, converges to zero. Despite the fact the contracting factors stays constant for any, arbitrarily large m, it seems that we cannot rule out the existence of zero measure orbits that violate this constraint.

Remark 6

In principle it would be tempting to consider a different measure that would become non trivial in the limit for m that diverges to infinity. This latter measure needs however to be invariant on S, which implies that it should asymptotically concentrate only on the Collatz integers 1, 2, 4, i.e. the supposed attractors for the deterministic dynamics. On the other hand, the invariance requirement at the coarse grained scale, necessitates dealing with a uniform (except for the weights 1 / 6 and 1 / 12) measure on the equivalence classes. This latter request cannot be reconciled with the need for a non uniform measure at integers level, making impossible the search for a modified measure that is both non trivial and invariant on the singletons.

5 Conclusions

In this paper we have provided an analytical argument to support the validity of the so called Collatz conjecture, a long standing problem in mathematics which dates back to 1937. The analysis builds on three main pillars. In short, we (i) introduced the (forward) third iterate of the Collatz map (so to reduce the analysis of the period 3 cycle to a search for a fixed point) and considered the equivalence classes of integer numbers modulo 8; (ii) defined a Markov chain (based on a suitable non trivial measure) which runs on a set of finite states and whose transition probabilities reflect the deterministic map; (iii) showed that orbits are on average contracting, as follows strict bound that combines the visiting frequencies, as derived in the framework of the aforementioned stochastic picture, and the contraction/expansion factors associated to each transition among classes. Notice that the conclusion reached holds for any level of imposed coarse graining, i.e. by computing the visiting frequencies on the partition in mod\(8^m\) classes, with m large as wished. Despite the measure introduced cannot be extended to weight individual singletons, we can proof that the Collatz dynamics is contracting on uniform partitions of the natural numbers in classes. These partitions can be refined to approximate singletons with suited accuracy, without eventually converging to them.


  1. 1.

    We shall hereafter label the rows and columns of \(Q^*\) with indexes running from 0 up to 7, rather than from 1 up to 8, as it is customarily done.

  2. 2.

    Observe that formally \(k=k(m)\).

  3. 3.

    In principle one could also consider the additional constraint on \(n_{min}\), imposed by the class of relative pertinence.



We would like to thank the numerous colleagues who interacted with us all along the various stages of the writing of this work. In particular we warmly thank Claudio Bonanno, Carlo Carminati, Jean-Charle Delvenne, Craig Alan Feinstein, Steffen Kionke, Shlomo Levental, Stefano Marmi, Vassilis Papanicolaou, François Stealens and Cédric Villani, for their insightful comments and remarks. The work of T.C. presents research results of the Belgian Network DYSCO (Dynamical Systems, Control, and Optimization), funded by the Interuniversity Attraction Poles Programme, initiated by the Belgian State, Science Policy Office.


  1. 1.
    Lagarias, J.C.: The Ultimate Challenge: The 3x+1 Problem. American Mathematical Society, Providence, USA (2010)Google Scholar
  2. 2.
    Oliveira, T., Silva, E.: Maximum excursion and stopping time record holders for the 3x+1 problem: computational results. Math. Comput. 68(1), 371–384 (1999)Google Scholar
  3. 3.
    Matthews, K.R., Watts, A.M.: A Markov approach to the generalised Syracuse algorithm. Acta Arith. 45, 29 (1985)Google Scholar
  4. 4.
    Buttsworth, R.N., Matthews, K.R.: On some Markov matrices arising from the generalised Collatz mapping. Acta Arith. 55 (1985)Google Scholar
  5. 5.
    Cox, D.R., Miller, H.D.: The Theory of Stochastic Processes. CRC Press, Florida, USA (1977)Google Scholar

Copyright information

© Unione Matematica Italiana 2017

Authors and Affiliations

  1. 1.Department of Mathematics and Namur Institute for Complex Systems-naXysUniversity of NamurNamurBelgium
  2. 2.Dipartimento di Fisica e AstronomiaUniversity of Florence, INFN and CSDCFlorenceItaly

Personalised recommendations