Background

It is well-known that the first digits of many numerical data sets are not uniformly distributed. Newcomb (1881) and Benford (1938)) observed that the first digits of many series of real numbers obey Benford’s law

$$ P^{B} (d) = \log_{10} (1 + d) - \log_{10} (d),\quad d = 1,2, \ldots ,9. $$
(1)

The increasing knowledge about Benford’s law and its applications has been collected in two recent books by Berger and Hill (2015) and Miller (2015). In number theory, it is known that for any fixed power exponent \( s \ge 1 \), the first digits of some integer sequences, like integer powers and square-free integer powers, follow asymptotically a Generalized Benford law (GBL) with exponent \( \alpha = s^{ - 1} \in (0,1] \) [see Hürlimann (2004), (2014a)] such that

$$ P_{\alpha }^{GB} (d) = \frac{{(1 + d)^{\alpha } - d^{\alpha } }}{{10^{\alpha } - 1}},\quad d = 1,2, \ldots ,9. $$
(2)

Clearly, the limiting case \( \alpha \to 0 \), respectively \( \alpha = 1 \), of (2) converges weakly to Benford’s law (1), respectively the uniform distribution. It is expected that many further integer power sequences follow a GBL with size-dependent parameter. However, if asymptotically such an exponent exists, it will not always be exactly \( \alpha = s^{ - 1} \). Hürlimann (2014b) shows that the first digits of powers from perfect power numbers follow asymptotically a GBL with parameter \( \alpha = (2s)^{ - 1} \).

Based on similar statistical analysis, the first digits of powerful integer powers are studied. For this, the GBL is fitted to samples of first digits using two goodness-of-fit measures, namely the MAD measure (mean absolute deviation) and the WLS measure (probability weighted least square or Chi square statistics). In “Size-dependent generalized Benford law for powerful integer powers”, the minimum MAD and WLS estimators of the GBL over finite ranges of powerful integer powers up to \( 10^{s \cdot m} , \, m \ge 8 \), \( s \ge 1 \) a fixed power exponent, are determined. Calculations illustrate the convergence of the size-dependent GBL with minimum MAD and WLS estimators to the GBL with exponent \( (2s)^{ - 1} \). Moreover, the existence of a one-parametric size-dependent exponent function that converges to these GBL’s is established, and an optimal value that minimizes its absolute deviation to the minimum MAD and WLS estimators is determined. A mathematical proof of the asymptotic convergence of the finite sequences to the GBL with exponent \( (2s)^{ - 1} \) follows in “Asymptotic counting function for powerful integer powers”.

Size-dependent generalized Benford law for powerful integer powers

A powerful number is a positive integer that is divisible by the square of each of its prime factors. It is of the form \( n = a^{2} b^{3} \) for some natural numbers \( a,\;b \ge 1 \). To investigate the optimal fitting of the GBL to first digit sequences of powers from powerful numbers, it is necessary to specify goodness-of-fit (GoF) measures according to which optimality should hold. For this purpose, the following two GoF measures are used. Let \( \{ x_{n} \} \subset [ 1 ,\infty ), \, n \ge 1 \), be an integer sequence, and let \( d_{n} \) be the (first) significant digit of \( x_{n} \). The number of \( x_{n} \)’s, \( n = 1, \ldots ,N \), with significant digit \( d_{n} = d \) is denoted by \( X_{N} (d) \). The MAD measure or mean absolute deviation measure for the GBL is defined to be

$$ MAD_{N} (\alpha ) = \frac{1}{9} \cdot \sum\limits_{d = 1}^{9} {\left| {P_{\alpha }^{GB} (d) - \tfrac{{X_{N} (d)}}{N}} \right|} . $$
(3)

This measure has been used to assess conformity to Benford’s law by Nigrini (2000) [see also Nigrini (2012), Table 7.1, p. 160]. The WLS measure for the GBL is defined by

$$ WLS_{N} (\alpha ) = \sum\limits_{d = 1}^{9} {\frac{{(P_{\alpha }^{GB} (d) - \tfrac{{X_{N} (d)}}{N})^{2} }}{{P_{\alpha }^{GB} (d)}}} . $$
(4)

Consider now the sequence of powerful integer powers \( \{ n_{pf}^{s} \} , \, n_{pf}^{s} < 10^{s \cdot m} \), for a fixed power exponent \( s = 1,2,3, \ldots \), and arbitrary powerful numbers \( n_{pf} \) below \( 10^{m} , \, m \ge 8 \). Denote by \( I_{k}^{s} (d) \) the number of powerful integer powers below \( 10^{k} , \, k \ge 1 \), with first digit \( d \). This number is defined recursively by the relationship

$$ I_{k + 1}^{s} (d) = S(\sqrt[s]{{(d + 1) \cdot 10^{k} }}) - S(\sqrt[s]{{d \cdot 10^{k} }}) + I_{k}^{s} (d),\quad k = 1,2, \ldots , $$
(5)

where the counting function \( S(n) \) is given by Golomb (1970) as

$$ S(n) = \sum\limits_{k = 1}^{{\left\lfloor {\sqrt[3]{n}} \right\rfloor }} {\mu^{2} (k) \cdot \left\lfloor {\sqrt {\tfrac{n}{{k^{3} }}} } \right\rfloor } , $$
(6)

where \( \mu (k) \) is the Möbius function such that \( \mu (k) = 0 \) if \( p^{2} \) divides \( k \) and \( \mu (k) = ( - 1)^{e} \) if \( k \) is a square-free number with \( e \) distinct prime factors, and \( \left\lfloor \cdot \right\rfloor \) denotes the integer-part function. Recent algorithms to efficiently compute the Möbius function and the related counting function of square-free integers are contained in Pawlewicz (2011) and Auil (2013).

With \( N = S(10^{m} ) \) one has \( X_{N} (d) = I_{s \cdot m}^{s} (d) \) in (3)–(4). A list of the \( I_{s \cdot m}^{s} (d), \, m = 8, \ldots ,15 \), \( s = 1,2,3,4,5 \), together with the sample size \( N = S(10^{m} ) \), is provided in Table 3 of the “Appendix”. Based on this, the so-called minimum MAD and minimum WLS estimators of the GBL are determined. Together with their GoF measures, these optimal estimators are reported in Table 1 below. Note that the minimum WLS is a critical point of the equation

$$ \begin{aligned} \frac{\partial }{\partial \alpha }WLS_{N} (\alpha ) = \frac{1}{N} \cdot \sum\limits_{d = 1}^{9} {\frac{{\partial P_{\alpha }^{GB} (d)}}{\partial \alpha } \cdot \frac{{P_{\alpha }^{GB} (d)^{2} - (\tfrac{{X_{N} (d)}}{N})^{2} }}{{P_{\alpha }^{GB} (d)^{2} }}} = 0, \hfill \\ \frac{{\partial P_{\alpha }^{GB} (d)}}{\partial \alpha } = \frac{{(1 + d)^{\alpha } \{ \ln (\tfrac{1 + d}{10})10^{\alpha } - \ln (1 + d)\} - d^{\alpha } \{ \ln (\tfrac{d}{10})10^{\alpha } - \ln (d)\} }}{{(10^{\alpha } - 1)^{2} }},\quad d = 1,2, \ldots ,9. \hfill \\ \end{aligned} $$
(7)
Table 1 GBL fit for first digit of powerful integer powers: MAD vs. WLS criterion

For comparison, the MAD and WLS measures for the following size-dependent GBL exponent

$$ \alpha_{LL} (s \cdot m) = (2s)^{ - 1} \cdot \{ 1 + c \cdot 10^{ - a \cdot m} \} , $$
(8)

with \( c = 1,\;a = 0.21119 \), called LL estimator, are listed. This type of estimator is named in honour of Luque and Lacasa (2009) who introduced it in their GBL analysis for the prime number sequence. Through calculation one observes that the LL estimator minimizes the absolute deviations between the LL estimator and the MAD (resp. WLS) estimators over the finite ranges of powerful powers \( [ 1 ,10^{s \cdot m} ] , \, m = 8, \ldots ,15, \, s = 1,2,3,4,5 \). In fact, if one denotes the MAD and WLS estimators of the sequence \( \{ n_{pf}^{s} \} , \, n_{pf}^{s} < 10^{s \cdot m} \), by \( \alpha_{MAD} (s \cdot m) \) and \( \alpha_{WLS} (s \cdot m) \), then one has uniformly over the considered finite ranges (consult the columns “Δ to LL estimate” in Table 1 in units of \( \sqrt[3]{{10^{ - m} }} \))

$$ \begin{aligned} \left| {\alpha_{WLS} (s \cdot m) - \alpha_{LL} (s \cdot m)} \right| \le 3.116 \cdot \sqrt[3]{{10^{ - m} }}, \hfill \\ \left| {\alpha_{MAD} (s \cdot m) - \alpha_{LL} (s \cdot m)} \right| \le 2.650 \cdot \sqrt[3]{{10^{ - m} }}. \hfill \\ \end{aligned} $$
(9)

Table 1 displays exact results obtained on a computer with single precision, i.e., with 15 significant digits. The MAD (resp. WLS) measures are given in units of \( 10^{ - 6} \) (resp. \( \sqrt[3]{{10^{ - (m + s + 12)} }} \)). Taking into account the decreasing units, one observes that the optimal MAD and WLS measures decrease with increasing sample size.

Asymptotic counting function for powerful integer powers

The following mimics Hürlimann (2014a), Section 3. It is well-known that a random process with uniform density \( x^{ - 1} \) generates data that are Benford distributed. Similarly, a sequence of numbers generated by a power-law density \( x^{ - \alpha } , \, \alpha \in [0,1] \), has a GBL first digit distribution \( P_{1 - \alpha }^{GB} (d) \) with exponent \( 1 - \alpha \) [e.g. Pietronero et al. (2000), Eq. (3)]. From it, one derives a counting function \( C(N) \) that yields the number of elements of that sequence in the interval \( [ 1 ,N ] \). A local density of the form \( x^{ - \alpha (x)} \), such that \( C(N)\sim \int\nolimits_{2}^{N} {x^{ - \alpha (x)} dx} \), is usually not appropriate and must be modified as follows. The relation for powerful integer powers over an interval \( [ 1 ,N^{s} ] \) that belongs to (8) is

$$ \alpha (N^{s} ) = \frac{2s - 1 - \alpha (N)}{2s},\quad \alpha (N) = c \cdot N^{ - a} ,\quad a = 0.21119,\quad c \ge 1, $$
(10)

Denote by \( Q_{s} (N^{s} ) \) the counting function for powerful integer powers in the interval \( [ 1 ,N^{s} ] \). Instead of \( \int\nolimits_{2}^{{N^{s} }} {x^{{ - \alpha (N^{s} )}} dx} \) define

$$ Q_{s} (N^{s} ) = q \cdot (2s)^{ - 1} \cdot \int\limits_{2}^{{N^{s} }} {x^{{ - \alpha (N^{s} )}} dx} ,\quad q = \frac{{\zeta (\tfrac{3}{2})}}{\zeta (3)} = 2.17325, $$
(11)

with \( \zeta ( \cdot ) \) the Riemann zeta function. In this expression, the integral pre-factor is chosen to fulfill the asymptotic limiting value for the powerful number counting function, that is (note that \( n_{pf}^{s} < N^{s} \) if, and only if, one has \( n_{pf} < N \))

$$ \mathop {\lim }\limits_{N \to \infty } \frac{{Q_{s} (N^{s} )}}{\sqrt N } = q. $$
(12)

In fact, two analytical bounds on \( S(N) \) are known, namely

$$ q \cdot \sqrt N - 3 \cdot \sqrt[3]{N} \le S(N) \le q \cdot \sqrt N ,{\text{ and}} $$
(13)
$$ q \cdot \sqrt N - 1.83522 \cdot \sqrt[3]{N} \le S(N) \le q \cdot \sqrt N - 1.207684 \cdot \sqrt[3]{N},\quad N \ge 961. $$
(14)

The first one is classical and proved in Golomb (1970). The second improved estimate is due to Mincu and Panaitopol (2009), Theorem 1. However, it suffices to use the simple estimate (12), which is obtained as follows. From (11) one gets for arbitrary \( s = 1,2, \ldots \), taking into account (10), the equivalent asymptotic formula

$$\begin{aligned} Q_{s}(N^{s})&\sim \frac{q}{2s \cdot (1 - \alpha (N^{s}))} \cdot N^{s \cdot (1 - \alpha (N^{s} ))} Q_{s} (N^{s}) \\ &= \frac{q}{1 + \alpha (N)} \cdot N^{\tfrac{1}{2}(1 + \alpha (N))} = q \cdot {\sqrt{N}} \cdot \frac{N^{a}}{N^{a}+ c} \cdot \exp \left( {\tfrac{1}{2}c\frac{\ln (N)}{N^{a}}} \right) \end{aligned}$$
(15)

which is independent of \( s \) and simply denoted by \( Q(N) \). The equality \( Q_{s} (N^{s} ) = Q(N) \) reflects the fact that there are as many powerful integer powers in \( [ 1 ,N^{s} ] \) as there are powerful numbers in \( [ 1 ,N ] \). Now, what is a good value of \( c \in [ 1 ,\infty ) \)? Clearly, the factor

$$ f_{N} (c) = \frac{{N^{a} }}{{N^{a} + c}} \cdot \exp \left( {\tfrac{1}{2}c\frac{\ln (N)}{{N^{a} }}} \right) $$
(16)

converges from above to 1 as \( N \to \infty \) for any fixed \( c \). Its derivative with respect to \( c \) satisfies the property \( \tfrac{\partial }{\partial c}f_{N} (c) > 0, \, \forall \;c \in [ 1 ,\infty ) , \, \forall \;N \ge 4 \). Since \( f_{N} (c) \) increases in \( c \) and decreases in \( N \), one has the following max–min property of (16) at \( c = 1 \):

$$ \mathop {\hbox{max} }\limits_{{N \ge 10^{16} }} \left \{ \mathop {\hbox{min} }\limits_{c \in [ 1 ,\infty ) } f_{N} (c) \right \} = f_{{10^{16} }} (1) = 1.0073. $$
(17)

Therefore, the size-dependent exponent (10) with \( c = 1 \) not only minimizes the absolute deviations between the LL estimator and the MAD (resp. WLS) estimators over the finite ranges of powerful powers \( [ 1 ,10^{s \cdot m} ] , \, m = 8, \ldots ,15, \, s = 1,2,3,4,5 \), as shown in “Size-dependent generalized Benford law for powerful integer powers”, but it turns out to be uniformly best with maximum error less than \( 7.3 \cdot 10^{ - 3} \) against the asymptotic estimate, at least if \( N \ge 10^{16} \). Moreover, the following limiting asymptotic result has been shown.

First digit theorem for powerful integer powers (GBL for powerful integer powers)

The asymptotic distribution of the first digit of powerful integer power sequences \( n_{pf}^{s} < 10^{s \cdot m} , \, m \ge 8 \), for fixed \( s = 1,2,3, \ldots \), as \( m \to \infty \), is given by

$$ \begin{aligned} \begin{array}{l} \mathop {\lim }\limits_{m \to \infty } \frac{{I_{s \cdot m}^{s} (d)}}{{S(10^{m} )}} = \mathop {\lim }\limits_{m \to \infty } P_{\alpha (s \cdot m)}^{GB} (d) = P_{{(2s)^{ - 1} }}^{GB} (d),\quad d = 1, \ldots ,9, \hfill \\ \alpha (s \cdot m) = \frac{1}{2s}\left( {1 + \frac{1}{{10^{a \cdot m} }}} \right),\quad a = 0.21119. \hfill \\ \end{array} \end{aligned} $$
(18)

The next Table 2 compares the new counting function \( Q(N)/q\sqrt N \) with the lower bound for \( S(N)/q\sqrt N \) in (14), as well as the ratios of these with the asymptotic value \( q \cdot \sqrt N \). Clearly, the new counting function is an approximation to the latter from above.

Table 2 Comparison of powerful number counting functions for \( N = 10^{m} \)

Conclusions

The first digits of some integer sequences like integer powers and square-free integer powers to a fixed power exponent \( s \ge 1 \), follow a generalized Benford law with size-dependent exponent that converges asymptotically to a GBL with the inverse power exponent \( s^{ - 1} \). In contrast to this, there exist integer sequences, for which such power sequences behave like a GBL with parameter different from \( s^{ - 1} \). The analysed powerful integer power sequences follow a GBL with parameter \( (2s)^{ - 1} \) and are in this respect similar to powers from perfect power numbers studied previously in Hürlimann (2014b). Moreover, all these power sequences typically are not exactly Benford distributed. Departures from Benford’s law occur quite frequently within mathematics and in almost all related scientific disciplines, and must be analysed along the line of appropriate probabilistic models. The companion papers Hürlimann (2015a, b) introduce and discuss some possibilities for a scientific wider use.