Leading Digit Laws on Linear Lie Groups

We determine the leading digit laws for the matrix components of a linear Lie group $G$. These laws generalize the observations that the normalized Haar measure of the Lie group $\mathbb{R}^+$ is $dx/x$ and that the scale invariance of $dx/x$ implies the distribution of the digits follow Benford's law, which is the probability of observing a significand base $B$ of at most $s$ is $\log_B(s)$; thus the first digit is $d$ with probability $\log_B(1 + 1/d)$). Viewing this scale invariance as left invariance of Haar measure, we determine the power laws in significands from one matrix component of various such $G$. We also determine the leading digit distribution of a fixed number of components of a unit sphere, and find periodic behavior when the dimension of the sphere tends to infinity in a certain progression.

is the significand and k(x) ∈ Z. The distribution of S B (x) has interested researchers in a variety of fields for over a hundred years, as frequently it is not uniformly distributed over [1, B) but exhibits a profound bias. If Prob(S B (x) ≤ s) = log B (s) we say the system follows Benford's law, which immediately implies the probability of a first digit of d is log B (d + 1) − log B (d) = log B (1 + 1/d) (at least if d + 1 ≤ B); in particular, base 10 has a first digit of 1 about 30% of the time, and 9 for only around 4.5% of the values. This bias was first observed by Newcomb [New] in the 1880s, and then rediscovered by Benford [Ben] nearly 50 years later.
Many systems follow Benford's law; on the pure math side these include the Fibonacci numbers (and most solutions to linear recurrence relations) [BrDu], iterates of the 3x+1 map [KonMi, LagSo], and values of L-functions on the critical strip among many others; on the applied side examples range from voter and financial data [Meb, Nig] to the average error in floating point calculations [Knu]. See [BH3,Mil] for two recent books on the subject, the latter describing many of the applications from detecting fraud in taxes, images, voting and scientific research, [BH2,Dia,Hi1,Hi2,Pin,Rai] for some classic papers espousing the theory, and [BH1,Hu] for online collections of articles on the subject.
Our purpose is to explore the distribution of leading digits of components chosen from some random process. We concentrate on two related systems. The first are various n × n matrix ensembles, which of course can be viewed as vectors living in R n 2 . The second are components of a point uniformly chosen on a unit sphere, which turn out to imply results for some of our matrix ensembles.
Following the work of Montgomery [Mon], Odlyzko [Od1,Od2], KaSa2], KeSn2,KeSn3], Conrey-Farmer-Keating-Rubinstein-Snaith [CFKRS] and many others, random matrix ensembles in general, and the classical compact groups in particular, have been shown to successfully model a variety of number theory objects, from special values to distribution of zeros to moments. In some number theory systems Benford's law has already been observed (such as values of L-functions in [KonMi], or values of Fourier coefficients in [AnRoSt]); thus our work can be interpreted as providing another explanation for the prevalence of Benford's law.
We first quickly review some needed background material and then state our results.
1.2. Haar Measure Review. Random matrix theory has enjoyed numerous successes over the past few decades, successfully modeling a variety of systems from energy levels of heavy nuclei to zeros of L-functions [BFMT-B, FiMil, Ha]. Early work in the subject considered ensembles where the matrix element were drawn independently from a fixed probability distribution p; this of course led to questions and conjectures on how various statistics (such as spacings between normalized eigenvalues) depended on p. For example, while the density of normalized eigenvalues in matrix ensembles (Wigner's semi-circle law) was known for all ensembles where the entries were chosen independently from nice distributions, the universality of the spacings between adjacent normalized eigenvalues resisted proof until this century (see, among others, [ERSY,ESY,TV1,TV2]). Instead of choosing the matrix elements independently and having to choose a p, we can consider matrix groups where the Haar measure gives us a canonical choice for randomly choosing a matrix element. 1 On an n-dimensional Lie group G there exists a unique, non-trivial countably additive measure µ which is left translation invariant (so µ(gE) = µ(E) for all g ∈ G and E a Borel set); µ is called the Haar measure. If our space is compact we may normalize µ so that it assigns a measure of 1 to G and thus may be interpreted as a probability. See [HR] for more details on the Haar measures and Lie groups.
We are especially interested in the case where G ⊂ GL(V ) is a connected linear Lie group; we take p i,j to be the projection of G onto the i, j-th coordinate and study the distribution of the leading digits. For many G the resulting behavior is easily determined, and follows immediately from the observation that a system whose density is 1 log B 1 x on [1, B) follows Benford's law (see definition 1.1) and Theorem 1.4). After introducing some terminology, we state five cases which are immediately analyzed from the Haar density; Theorem 1.7 plays a key role in our later work (Theorem 1.8). These theorems are interpretations of Haar measure decompositions of classical noncompact G (see [HR]). Care must be taken to separate the notion of digit law for the compact and noncompact cases since many noncompact G do not posses a G invariant probability measure. So we have two definitions of leading digit law: for noncompact G, we average the measure of significands over a neighborhood of a specific one-parameter subgroup (see Definition 1.2 for a precise statement). If G is compact, the Haar measure affords a global average over all matrix elements. So one may think of the noncompact digit law as a local, and the compact digit law as global (see Definition 1.10). 1 These are the ensembles that turn out to be most useful in number theory, not the ones arising from a fixed distribution.
be the disk of radius ǫ containing X that is orthogonal to X in L(G), we have (1.5) We typically take µ to be the left or right invariant Haar measure on G. If µ is left or bi invariant, (1.5) becomes . (1.6) Proof. As the Lie algebra L(R + ) = R of R + is one dimensional, the perpendicular subspace to R is {0}, Thus for any s ∈ [1, B), one has U ǫ ([0, log s)X) = [0, log s)X, whence (1.6) becomes In the spirit of Theorem 1.4, when the Haar density decomposes as a product of densities on the matrix components, as it does in the next three theorems, the digit laws are easily determined from formulation (1.6). Theorem 1.5. Let G = P be the group of real-valued upper triangular matrices: (1.8) The leading digit law of A ii for the left invariant Haar density dg L is Proof. The left invariant Haar measure on P has density dg L = 1 a 11 a 2 22 · · · a n nn i<j da ij (1.9) and the right invariant Haar measure on P has density dg R = 1 a n 11 a n−1 22 · · · a nn i<j where da ij is the Lebesgue density on R in both cases. All leading digit laws follow.
Theorem 1.6. Let D be the group of real-valued diagonal matrices: (1.11) For each i between 1 and n, the leading digit law of a ii with respect to the bi-invariant Haar density dg is B-Benford.

Proof. The bi-invariant Haar measure on
where da ii is the Lebesgue measure on R. The digit laws follow.
Theorem 1.7. Let D 1 be the group of real-valued, determinant 1 diagonal matrices: (1.13) For each i between 1 and n, the leading digit law of a ii with respect to the bi-invariant Haar density dg is B-Benford.

Main Results.
Our first result concerns the distribution of entries from SL n (R). Denote by L, U, D 1 ⊂ G the subgroups of unipotent lower triangular, unipotent upper triangular, and diagonal subgroup of SL n (R). Then g ∈ G can be uniquely expressed as g = lud, l ∈ L, u ∈ U, d ∈ D 1 . Note that each of L, U, D 1 is topologically closed in SL n (R), and hence each is a Lie subgroup of G. If l, u, d 1 be the Lie algebras of L, U, D respectively then l, u, d 1 have the vector space basis (which we review in Appendix A): where E i,j is the n × n matrix with 1 in the (i, j) position and zeroes elsewhere.
where dX, dY are the Lebesgue measures on l, u and is the Haar measure on D 1 . Consequently, the joint distribution of diagonal components is a product of B-Benford measures.
The next corollary follows immediately from the invariance of dg on SL n (R): Corollary 1.9. Let P, Q ∈ SL n (R) be even order permutation matrices. For A ∈ SL n (R), the joint distribution of the diagonal components of P AQ are a product of B-Benford measures.
In other words, the joint distribution of n components is a product of B-benford measures if there is an even permutation of the rows and columns which sends the n components to the diagonal components. As an immediate consequence of the above, we obtain results on the behavior of determinants of matrices from GL n (R) + (Theorem B.3). For other results related to Benford's law and matrices, see [B-], who prove that as the size of matrices with entries i.i.d.r.v. from a nice fixed distribution tends to infinity, the leading digits of the n! terms in the determinant expansion converges to Benford's law. Also see [BH3] for results arising from powers of fixed matrices.
When G is compact, the Haar measure may be normalized to be an invariant probability measure on G, affording a global definition of digit law, stated next. Definition 1.10. Fix a base B > 0. Let G be a compact connected Lie group, µ a positive countably additive probability measure on G, f : ( 1.19) We shall see that when G = O(n) or U(n), f is a projection of G onto the (i, j)-th component and µ is Haar, the digit laws come as a consequence of digit laws from a point drawn at random from a unit sphere (see Corollary 1.14). So our next result yields digit laws for components of a point drawn at random on an n-dimensional sphere of radius r: (1.20) We adopt the notational convention for the unit sphere: S n := S n (1).
Theorem 1.11. Let x 1 be the first component of an x ∈ S n chosen uniformly at random. We have As n → ∞, Stirling's formula implies the above converges to integrating a Gaussian density, where erf is the standard error function: (1.22) Lemma 1.12. Fix a base B > 1 and 1 ≤ a < b < B. Let x 1 and x be as in Theorem 1.11. As (1.24) Remark 1.13. Lemma 1.12 has an interesting consequence. First, consider the sequence of spheres S nB 2ℓ , ℓ ∈ N. For n sufficiently large, with n 2 1 B > 4, Then (1.25) By choice of n, the additional terms from extending the sums to all i are more than 4 standard deviations from the mean, and contribute negligibly to the sum in the limit. Hence for n sufficiently large, For fixed n ∈ N, it follows that the leading digit law of x 1 in S nB 2ℓ , as ℓ → ∞, tends to the digit law F n : [1, B) → [0, 1) whose cumulative distribution function is given by As F n (x) = F nB 2 for any n ∈ N, it follows that leading digit law of x 1 in S k , k → ∞, falls into the periodic cycle of B 2 − 1 limiting digit laws F n , 1 ≤ n < B 2 as defined in (1.27). We plot a representative set of n in Figure 1.
Lemma 1.12 and its consequences can be generalized to a fixed number of components; we do this in Lemma 3.1.
The spherical digit law in one component (Lemma 1.12) yield digit laws for the compact matrix group O n (R), stated next.
Corollary 1.14. The leading digit law in the (i, j) component of O n (R) with respect to Haar is the leading digit law of x 1 in S n−1 with respect to the uniform measure.
Proof. As O n (R) contains every permutation matrix P ∈ GL n (R), there exist permutation matrices P, Q ∈ GL n (R) such that P AQ ∈ O n (R) sends the (i, j) entry to the (1, 1) entry. By invaraiance of dg, it suffices to prove the Corollary for the (1, 1) component of O n (R). Recall that any matrix A ∈ O n (R) satisfies A T A = I, so the columns of O n (R) form an orthonormal basis of R n . We may therefore embed O n (R) in the product of n spheres S n−1 × · · · × S n−1 . Consider the construction of a matrix in O n (R) one column at a time from left to right. The first column c 1 can be selected arbitrarily from S n−1 . The second column c 2 is a vector selected in the orthogonal 7 plane to c 1 in S n−1 , a set which is isometric to S n−2 . In general the ith column is selected in the orthogonal hyperplane to c 1 , . . . , c i−1 in S n−1 , which is a set isometric to S n−i Since the O n (R) action on a subset A ⊂ O n (R) preserves the Haar measure of A, there is a measure preserving transformation between a basis for the Haar measurable sets of O n (R) and measurable subsets A 1 × A 2 , . . . , A n ⊂ S n−1 × S n−2 × S 0 equipped with the uniform measure on S i . Therefore, the digit law of the (1, 1) component of O n (R) is equal to the digit law of S n−1 with the uniform measure. The leading digit law follows.
Thus one sees the same asymptotic periodicity in the leading digit laws in the (i, j) component of O n (R) with period B 2 − 1 in n. By invariance of dg, it follows that Lemma 1.14, 3.1, and formulas therein yield leading digit laws for a fixed number of components of O n (R), so long as all components lie in the same row or column. Lastly, analogous digit laws for the real an imaginary parts in a fixed number of components U n (C) are immediate, since U n (C) contains every permutation matrix and the first column of U n (C) is a point on S 2n−1 .

Remark 1.15. We leave the leading digit laws of a hyperbola as future research.
We prove Theorem 1.8 on components of SL n (R) in §2 (see also Appendix B for a more geometric proof in two dimensions), and then Theorem 1.11 in §3, discussing some additional consequences (we have already shown above how it yields digit laws for the classical compact groups). We then finish with some concluding remarks and thoughts on future research.

PROOF OF THEOREM 1.8
Let L, U, D 1 be lower, upper, and diagonal matrices determinant 1 matrices, l, u, d 1 be as before; we can calculate the density of dg with respect to the decomposition G = LUD 1 . Pick any g 0 in G, and parametrize g in a neighborhood of g 0 using exponential coordinates g = g(X, Y, Z) = g 0 exp X exp Y exp Z. (2.1) where X ∈ l, Y ∈ u, Z ∈ d 1 , then l + u + d 1 = g. It follows that the derivative at g 0 in the direction of X is By a change of variables and left invariance, the differential with respect to coordinate bases of l, u, d is given by the block matrix where [Ad(exp Z exp Y ) −1 (X)] l is the part of Ad(exp Z exp Y ) −1 (X) that lies in the subspace l. Thus, the volume element on G = LUD 1 in a neighborhood of g 0 is given by and is independent of g 0 . By Fubini's theorem, G φ(g)dg is (2.7) Using the ordinary basis {E i,j } i<j of u, the adjoint action of the diagonal subgroup D on u is It follows that (with respect to exponential coordinates of the first kind) (2.9) With respect to the basis {E i,j } i>j of l, one can see that (2.11) Ordering the basis of l along sub-diagonals, Ad((ud) −1 ) l becomes upper triangular, with Ad(u −1 ) l = id l (2.12) for all u ∈ R. Therefore, | det Ad((ud) −1 ) l | = | det Ad(d −1 ) l |, and the adjoint action of D on l is simply (2.13) Therefore | det Ad((ud) −1 ) l || det Ad((d) −1 ) u | = 1 (2.14) and Theorem 1.7 completes the proof.
We provide another proof of Theorem 1.8 through a geometric approach, based on the area of the hyperbolic sector, in Appendix B.

PROOF AND CONSEQUENCES OF THEOREM 1.11
For r > 0, let S n (r) = {x ∈ R n+1 | |x| = r} (3.1) be the sphere of radius r in R n+1 . Denote by V n (r) and S n (r) the volume and surface area of S n (r) (recall we write S n for the unit sphere). Fix a base B > 1 and let S B (x) be the significand function, i.e., S B (|y|) ∈ [1, B) is the unique number satisfying for some k ∈ Z.
Proof of Theorem 1.11. Pick a point x ∈ S n uniformly at random, and let x 1 be the first component of x. We are interested in the leading digit distribution of x 1 . By symmetry, the distribution for other components will be similar. Notice in R n+1 that for 0 < a < 1 Approximating the surface area in the strip {a < x 1 < b, x ∈ S n } by a frustum, it follows for n > 0 that . (3.4) By the familiar relationships S n (r) = V ′ n (r) = n+1 r V n (r), and the closed form solution (1 − x 2 1 ) n/2−1 dx 1 .
(3.6) Now, fix a, b, where 1 ≤ a < b ≤ B. By symmetry, we may double the digit distribution in the positive half-space x 1 > 0. Thus For example, when n = 1 we have and thus The leading digit distribution on S 2 is uniform, with respect to any base, which is akin to the fact that equal width slices of a spherical loaf contain the same amount of crust. Our main theorem is an asymptotic result. In a sense, the digit distribution is found by applying Stirling's formula and integrating the standard Gaussian.
Proof of Lemma 1.12. Let a, b ∈ R satisfy 1 ≤ a < b < B. Recall, from the above derivation (see By Stirling's approximation Γ(n/2 + 1/2) Γ(n/2) = n 2 + O(1). (3.13) Using this with the substitution x = y 2/n, dx = dy 2/n in the integrand yields, for n sufficiently large and x ∈ S n , that (3.14) Lemma 1.12 can be generalized to many components. Pick a point at random on the unit sphere S n ⊂ R n+1 , and consider the first k components x 1 , . . . , x k (k < n + 1). We are interested in the joint distribution of leading digits that appear in the first k components. Similar to the analysis above, for a point (a 1 , a 2 , . . . , a k ) in the open unit disk D k , notice that the other n − k + 1 11 components lie in a n−k sphere of radius 1 − a 2 1 − · · · − a 2 k . Exploiting the rotational symmetry in the last n − k + 1 components, we may parametrize the surface element dS n of S n by D k as Fix an integer k > 0, and let a 1 , b 1 , . . . , a k , b k ∈ R satisfy 0 ≤ |a i | < |b i | < 1, 1 ≤ i ≤ k. For n > k sufficiently large, in the sense that the difference between (3.16) and (3.17) tends to zero as n → ∞.
Corollary 3.2. Fix an integer k > 0. For any base B > 1, and a 1 , b 1 , . . . , a k , b k ∈ R satisfying 1 ≤ a i < b i < B, 1 ≤ i ≤ k, we have for n > k sufficiently large 3.21) in the sense that the difference between (3.20) and (3.21) tends to zero as n → ∞. In particular, the joint leading digit distribution of the first k components is asymptotically periodic in n, with period B 2 , tending to one of the B 2 − 1 limiting distributions (3.22)

CONCLUSIONS AND FUTURE WORK
Our results above can serve as a means for detecting underlying symmetries of a physical system. For example, imagine we are trying to construct matrices from one of the classical compact groups according to Haar measure (see [Mez] for a description of how to do this). We can use our digit laws as a test of whether or not we are simulating the matrices correctly. It would be interesting to generalize the arguments above to other groups of matrices, including those over fields other than the reals.

APPENDIX A. LINEAR LIE GROUPS
A Lie group G ⊂ GL(V ) is a group equipped with a differentiable structure such that the binary operation G × G → G is differentiable. The Lie algebra L(G) may be naturally identified with the tangent space T e (G) to the identity. For a direction X ∈ L(G) there is a unique one parameter subgroup exp(tX), t ∈ R, in the direction of X and the map exp : L(G) → G is a local diffeomorphism. Let E ij be the n × n matrix with 1 in the (i, j) entry and zeroes elsewhere.
The groups in this paper are the following.
• The general linear group GL n (R) of matrices of nonzero determinant and its Lie algebra gl n (R) of all n × n matrices. • The special linear group: SL n (R) = {A ∈ GL n (V ) | det A = 1} and its Lie algebra sl n (R) = {X ∈ gl n (R) | trX = 0} of traceless matrices. • The space of diagonal matrices D ⊂ GL n (R) with nonzero diagonal entries and its Lie algebra d of diagonal matrices with entries in R. • The space of diagonal matrices determinant D 1 (R) ⊂ GL n (R) with nonzero diagonal entries and its Lie algebra d 1 of traceless matrices with entries in R. • The space of upper triangular matrices U(R) ⊂ GL n (R) with nonzero diagonal entries and its Lie algebra u of upper triangular matrices with entries in R. • The space of lower triangular matrices L(R) with nonzero entries and its Lie algebra l of lower triangular matrices with entries in R.
where E i,j is the n × n matrix with 1 in the (i, j) position and zeroes elsewhere. The goal of this section is to provide a geometric proof of Theorem 1.8 in two dimensions. We start with a useful, classical result. Lemma B.1. The area of the hyperbolic cone Proof. The region under the curve 1/x has area log(b) − log(a) = log(b/a), and one can form the sector from this region by first attaching the triangle with corners (0, 0), (a, 0), (a, 1/a) and then removing the triangle with corners (0, 0), (b, 0), (b, 1/b). Both triangles have area 1/2.
Treating SL 2 (R) as the graph of d = (1 − ac)/b, construct from A ⊂ SL n (R) the cone on A to the origin. Since the SL 2 (R) action preserves volume, the Haar measure on SL n (R) equals (up to a scalar) the volume of the cone on A ⊂ SL n (R). This observation forms the basis of the proof.
We give a series of statements that simplify the argument but create no loss of generality. Clearly dg is B-Benford in the (1, 1) component if and only if c dg is is B-Benford in the (1, 1) component, so we take the Haar measure on SL 2 (R) that was constructed earlier. Let a 11 = a; notice that a = 0 is a zero measure subset of (SL 2 (R), µ), so we treat SL 2 (R) as the graph of the function d = (bc − 1)/a. By symmetry it suffices to prove the theorem when our sequence of compact sets K i lie in SL 2 (R) + when K = graph(d) (with d = (1 − bc)/a), defined over a rectangular domain Recall that µ(K) = µ(graph(d)) = λ(C(graph(d))) is the volume of the cone consisting of all line segments from O to the graph of d. Consider the solid S := S(graph(d)) bounded below the graph of d whose volume is We wish to relate λ(C(graph(d))) to λ(S(graph(d))). By our restriction to positive coordinates, we see that d is decreasing along each ray emanating from the origin in a direction of D. As we are assuming graph(d) > 0 on D, λ(C(graph(d)) can be found by appending to S the three pyramidal regions whose bases are the (3-dimensional) faces of S, given by The apex for all 6 pyramids is the origin. Thus λ(C(graph(d))) = λ(S) + λ(C(S ∩ {a = 1})) − λ(C(S ∩ {a = x})) + λ(C(S ∩ {b = −ǫ})) − λ(C(S ∩ {b = ǫ})) + λ(C(S ∩ {c = −ǫ})) − λ(C(S ∩ {c = ǫ})). (B.6) Recall that the 4-dimensional volume of a pyramid is 1/4 the volume of the base time the height, and the volume of the base of each pyramid is simply the double integral over the appropriate slice. Notice that the second and third terms cancel, and each integral that remains is separable, with the same limits of integration on a. If we let F (ǫ) be the quantity Proof. Let GL n (R) + be the group of all invertible n × n matrices with positive determinant. The map f : GL n (R) + → R + × SL n (R) (B.11) given by f (g) = (det(g), (det(g)) − 1 n g) is a Lie isomorphism, allowing for a decomposition of the Haar measure on GL n (R) + as follows: there exists a constant c > 0 such that for any compactly supported function φ ∈ C c (GL n (R) + ) GLn(R) + φ(g) det(g) −n dg = c R + dr r SLn(R) φ(ry)dµ ′ (y). (B.12) 15 As any compact set in GL n (R) + can be well approximated by cubes of the form [−ǫ, ǫ]K ′ , K ′ ∈ SL n (R) compact, the result follows.