Bounding measures of genetic similarity and diversity using majorization

The homozygosity and the frequency of the most frequent allele at a polymorphic genetic locus have a close mathematical relationship, so that each quantity places a tight constraint on the other. We use the theory of majorization to provide a simplified derivation of the bounds on homozygosity J in terms of the frequency M of the most frequent allele. The method not only enables simpler derivations of known bounds on J in terms of M, it also produces analogous bounds on entropy statistics for genetic diversity and on homozygosity-like statistics that range in their emphasis on the most frequent allele in relation to other alleles. We illustrate the constraints on the statistics using data from human populations. The approach suggests the potential of the majorization method as a tool for deriving inequalities that characterize mathematical relationships between statistics in population genetics.

sures of allele frequencies, including measures of genetic similarity and diversity, are often computed from data, leading to much subsequent interpretation and additional computation.
Recent studies have demonstrated that pairs of population-genetic summary statistics often have a close mathematical relationship, so that values of one quantity strongly constrain the values of a second quantity (Hedrick 1999(Hedrick , 2005Long and Kittles 2003;Rosenberg and Jakobsson 2008;Maruki et al. 2012;Jakobsson et al. 2013;Edge and Rosenberg 2014). For example, for a polymorphic locus, Rosenberg and Jakobsson (2008) showed that two measures of the homogeneity of the allele frequencies in a population-homozygosity and the frequency of the most frequent allele-tightly constrain each other over the unit interval, so that each can be predicted by the other to within 1/4, and so that the mean range over the unit interval for one of the statistics given the other is 2 3 − π 2 18 ≈ 0.1184. Reddy and Rosenberg (2012) then tightened the bounds placed by homozygosity on the frequency of the most frequent allele and vice versa, in the case of a fixed value for the number of distinct alleles. These theoretical results provide guidance for interpreting computations of homozygosity and the frequency of the most frequent allele, including in homozygosity-based tests for evidence of natural selection (Rosenberg and Jakobsson 2008;Garud and Rosenberg 2015).
To further advance the study of mathematical properties of population-genetic statistics describing similarity and diversity of alleles in a population, we investigate the application of the theory of majorization-a mathematical and statistical approach concerned with evenness of arrayed structures (Marshall et al. 2010)-to these statistics. Majorization provides general principles concerning maxima and minima, enabling bounds to be obtained for broad classes of functions. In addition, as a theory about mathematical functions in general, it readily suggests new functions to be used as summary statistics, functions whose mathematical bounds are achieved at the same allele frequency distributions that generate bounds for existing statistics. We exploit this aspect of majorization to investigate the relationship of the frequency of the most frequent allele to various homozygosity-related statistics as well as to the Shannon-Weaver entropy statistic for genetic diversity.
First, we formally introduce the population-genetic statistics of interest. Next, we describe the majorization framework and establish its relationship with convex functions, the key connection that enables the derivation of our mathematical bounds. We then obtain bounds given a fixed size for the frequency of the most frequent allele on homozygosity statistics and the Shannon-Weaver index. We provide simpler derivations of results reported by Rosenberg and Jakobsson (2008) and Reddy and Rosenberg (2012); these derivations naturally lead us to consider a larger family of homozygosity-related statistics that we call α-homozygosities, whose extreme values occur at the same allele frequency vectors as for standard homozygosity. Similarly, the Shannon-Weaver bounds can be extended to provide bounds on the more general Rényi entropies. We compare the constraints placed by the frequency of the most frequent allele on α-homozygosity for various choices of α, by applying the bounds to data on 783 multiallelic microsatellite loci in a sample of 1048 individuals drawn from worldwide human populations. The comparison helps extend the understanding of the empirical relationships among statistics describing genetic similarity and diversity in populations.

Statistics based on allele frequencies
Consider a population and a polymorphic genetic locus with I distinct alleles. We represent the allele frequencies of the locus by the frequency vector p = ( p 1 , . . . , p I ).
The components of p are arranged in descending order such that p i p j if i < j. We sometimes denote p 1 , the largest allele frequency, by M. For all i, p i ∈ [0, 1], and the entries in p sum to 1. Note that p 1 > 0.
Homozygosity for a locus is the sum of the squares of allele frequencies at the locus, For diploid loci at Hardy-Weinberg proportions, J measures the total frequency of homozygotes in a population. For loci of any ploidy, it provides a measure of the homogeneity of an allele frequency distribution, approaching 0 for a distribution consisting of many rare alleles and equaling 1 if a distribution has only a single allelic type.
We also consider a generalization that we term α-homozygosity, which we define by where α > 1. The standard homozygosity J is the α-homozygosity for the case of α = 2. Next, we examine the Shannon-Weaver entropy index of diversity, also known as the Shannon-Wiener or Shannon index. This quantity is defined by taking 0 log 0 = 0. Unlike homozygosity, this index increases with diversity in the allele frequency distribution, rather than with homogeneity.
Note that the Shannon-Weaver index is a limiting case of the Rényi entropies, a family of diversity measures indexed by a variable α. For a specified value of α ∈ (0, 1) ∪ (1, ∞), the Rényi entropy of order α, or α-entropy for short, is defined by (2.1) H α and J α are related by H α = 1 1−α log(J α ). For α = 1, by l'Hôpital's rule, lim α→1 H α (p) = H (p) for any allele frequency vector p. We can then identify lim α→1 H α with the Shannon-Weaver index H , treating the Shannon-Weaver index as the α = 1 case of the Rényi entropy.

Majorization
The theory of majorization (Marshall et al. 2010) is concerned with the notion of evenness in comparisons between pairs of vectors. Given a vector v = (v 1 , . . . , v I ) with I components, let v [i] denote its ith largest component, which is not necessarily the same as its ith component, v i . For example, if v = (1, 5, 3, 4), then v [1] = 5 but v 1 = 1, and v [2] = 4 but v 2 = 5. For a pair of vectors v, w whose elements have the same sum, we compare their evenness by saying that v majorizes w if w is "more evenly distributed," or "less concentrated," than v. The following definition formalizes this concept.
Definition 2.2 (Majorization) Let v, w ∈ R I . Then v majorizes w if both of the following conditions hold. (1) If v majorizes w, then we write v w.
The first condition states that the vectors have the same sum of components. The second condition states that for each k, the sum of the k largest components of v is greater than or equal to the corresponding sum of the k largest components of w. It can be verified from the definition that (0.75, 0.25) majorizes (0.5, 0.5) and (0.8, 0.1, 0.05, 0.05) majorizes (0.4, 0.3, 0.15, 0.15), but neither (0.5, 0.25, 0.25) nor (0.4, 0.4, 0.2) majorizes the other. Note that if v w and w v, then w must be a permutation of v. This result follows from the fact that if v w and w v, then (1/I, . . . , 1/I ). Moreover, for any p ∈ I −1 , (1, 0, . . . , 0) p (1/I, . . . , 1/I ).

Functions preserving majorization
Because majorization ranks vectors by evenness, it is natural to identify mathematical functions that preserve the ranking order. Such functions are termed isotone with respect to majorization: if v majorizes w, then an isotone function F outputs a value F(v) at least as large as F(w). A function is termed antitone with respect to majorization, if whenever vector v w, the function outputs a smaller or equal value. An isotone function is largest at maximal vectors with respect to the majorization order and smallest at minimal vectors, whereas the reverse is true for an antitone function.
An isotone function with respect to majorization provides a sensible index of concentration of vectors, whereas an antitone function provides a sensible index of diversity. As we shall see below, homozygosity is isotone, whereas the Shannon-Weaver index is antitone.
The functions that are isotone, or preserve the majorization order, are closely related to convex functions. Suppose S is a set of I -dimensional vectors, possibly R I . A function F : S → R that preserves majorization on S is termed Schur-convex; the Schur-convex functions are by definition the isotone functions. Mathematically, F is Schur-concave functions are the antitone functions.
A useful method for identifying Schur-convex functions is the Schur-Ostrowski criterion (Marshall et al. 2010 pg. 84).

Theorem 2.3 (Schur-Ostrowski criterion) If F is symmetric in the components of its argument and all its first partial derivatives exist, then F is Schur-convex if and only if for every
The inequality is reversed for Schur-concave F.
We will have occasion to use a particular case of the Schur-Ostrowski criterion. The first derivative of a differentiable convex function f : R → R of one variable is by definition increasing. Therefore, for any pair of points Hence, a function F that is decomposable into individual, identical convex functions in each of its arguments satisfies the Schur-Ostrowski criterion. An analogous statement holds for Schur-concave functions in relation to concave functions. We state this claim formally.

Corollary 2.4 If
This corollary, in which the part about strict convexity appears on p. 92 of Marshall et al. (2010), is conveniently stated in the following form (Karamata 1932; see also pp. 156-157 of Marshall et al. 2010).
Theorem 2.5 (Karamata's inequality) Let S ⊆ R be an interval and let f : S → R be convex, and let v 1 , . . . , v I , w 1 , . . . , w I ∈ S. If v w, then If f is concave, then the inequality is reversed. If f is strictly convex or strictly concave, then equality holds if and only if the list of values w 1 , . . . , w I gives a permutation of v 1 , . . . , v I .
With these tools, we are now ready to study the constraints placed on J and H by M. In the next section, we apply Karamata's inequality to establish bounds on these statistics as functions of the fixed largest allele frequency. In establishing Theorems 3.2 and 3.9, the main results from which we obtain the bounds on the statistics, the proofs follow a similar general structure. In each proof, we characterize the vectors that lie at extremes with respect to the majorization order among the vectors in a space. We then show that for functions on that space satisfying convexity conditions, extreme values occur at vectors that are extreme with respect to the majorization order.

Results
We consider bounds on population-genetic statistics in terms of the frequency M of the most frequent allele in two settings. First, we consider an unspecified number of distinct alleles (Sect. 3.1). This section includes the bounds of Rosenberg and Jakobsson (2008) and our new bounds on α-homozygosity. Second, we consider a specified number of distinct alleles (Sect. 3.2). This section includes the bounds of Reddy and Rosenberg (2012) and new bounds on α-homozygosity, the Shannon-Weaver index, and the Rényi entropy.
To obtain the bounds, we rely on the observation that f (x) = x α is strictly convex for α > 1 and x 0, so that J α is strictly Schur-convex by Corollary 2.4. The convexity of f (x) = x α follows from the second derivative test for convexity, by Further, because f (x) 0 for x 0 with equality if and only if x = 0, f is strictly convex. We apply this observation in order to derive bounds on homozygosity obtained by Rosenberg and Jakobsson (2008), as well as to obtain bounds on α-homozygosity. We also rely on the fact that the Shannon-Weaver index is strictly Schur-concave as a consequence of Corollary 2.4 together with the fact that f (x) = −x log x is strictly concave for x 0.
We now derive bounds on homozygosity, α-homozygosity, the Shannon-Weaver index, and the Rényi entropy that quantify the constraints placed by M. We will see that the majorization approach not only reproduces the results of Rosenberg and Jakobsson (2008) and Reddy and Rosenberg (2012), it also produces bounds for the more general α-homozygosity.

Unspecified number of distinct alleles
The following result was obtained by Rosenberg and Jakobsson (2008). This result indicates that for a fixed value of the frequency of the most frequent allele, homozygosity is maximized by setting as many allele frequencies as possible equal to the largest frequency, with at most one nonzero allele frequency remaining.
Theorem 3.1 (Theorem 2, Rosenberg and Jakobsson 2008) Consider a sequence of the allele frequencies at a locus, with equality if and only if p i = M for 1 i K − 1, p K = 1 − (K − 1)M, and This result was the main mathematical result of Rosenberg and Jakobsson (2008); to obtain it in a manner that provides a broader mathematical perspective, we prove a general theorem, Theorem 3.2, which applies to a general class of functions that includes homozygosity. By checking that the theorem applies to J , Theorem 3.1 will follow as a corollary. Let = ∞ I =1 I −1 be the set of all nonnegative vectors of finite length summing to 1. For a real number x, we use {x} = x − x to denote its fractional part.

If f is strictly convex, then this is the only configuration at which equality holds. If f is strictly convex, then the inequality in (i) is strict.
Theorem 3.2 indicates that any bounded function F that can be expressed as a sum of convex functions f of each argument and that satisfies other mild conditions is bounded below by f (M), where M is the largest component of the vector p at which F is being evaluated, and it cannot exceed the value given by evaluating F at the vector To see how Theorem 3.1 follows from the more general Theorem 3.2, set f (x) = x 2 . We know that f is continuous and strictly convex, with nonnegative range. It satisfies f (0) = 0, and moreover F is bounded, as 0 To prove Theorem 3.2, we require a lemma concerning convex functions. This straightforward lemma states that for a convex function f , if certain conditions are satisfied, then the sum Proof The proof relies on Karamata's inequality applied to sequences of increasing length. First, observe that for any positive integer n, (x 1 + · · · + x n , 0, . . . , 0) (x 1 , x 2 , . . . , x n ). Hence, by Karamata's inequality applied to the convex function f (Theorem 2.5), Consider the limit as n → ∞. As a bounded increasing sequence, n i=1 x i converges to its limit, lim n→∞ be any vector in with p 1 = M. Function f is assumed to be nonnegative, continuous, and convex with is bounded, f satisfies the conditions of Lemma 3.3. Hence, for each n, Because t and u possibly have distinct (finite) lengths, define m = max( M −1 + 1, N ). Append zeroes to t or u, so that t and u have the same length, m. We prove that after this procedure is performed, t u. This result would immediately imply the claim, because we could then apply Karamata's inequality (Theorem 2.5) to convex f and vectors t and u to obtain (ii). Indeed, because t u and f is convex, denoting Here, we use the fact that f (0) = 0, so that the additional zeroes do not affect F(t) or F(u).
We now verify that t u. Observe that t and u have the same sum. Moreover, because p i M for each i N −1, it follows that the sum of the j largest components of u, where j M −1 , is bounded above by j M. For any j > M −1 , the sum of the j largest components of t is 1, which is always an upper bound for the sum of the corresponding components of u. Thus, t u as claimed, and (ii) holds.
For the equality condition, note that equality in (ii) requires F(u) = F(t). For strictly convex f , F(u) = F(t) requires that u be a permutation of t, as otherwise, we would have F(t) > F(u) by the strict Schur-convexity of F that follows from Corollary 2.4. Because the components of u and t are arranged in decreasing order, equality in (ii) requires that u = t. Hence, vectors ∼ p are the only vectors in that achieve equality in (ii).
(i) Observe that for any p ∈ with p 1 = M, we have, by the nonnegativity of f on [0, ∞), In the case that f is strictly convex and Note that for convenience, in the statement of Theorem 3.2, both M −1 and M −1 appear. In verifying (ii), we have simultaneously considered two cases for which the proof proceeds in the same way: if M is the reciprocal of an integer, then M −1 = M −1 , and otherwise, M −1 = M −1 − 1. In the case that M is the reciprocal of an integer, t has a zero in position M −1 + 1 and {M −1 } = 0. The proof is not affected by these values of 0.
Having proven Theorem 3.2 and having shown that it implies Theorem 3.1, we now proceed to show that it implies a similar result for the more general α-homozygosity.
Recall that α-homozygosity J α (p) is defined by raising each component of an allele frequency vector p to the αth power and taking the sum across components.
is continuous and nonnegative on and satisfies f (0) = 0. We also see that Consequently, Theorem 3.2 provides bounds on the values of J α for a fixed value p 1 = M of the largest allele frequency.
Equality By setting α = 2 in Corollary 3.5, we recover Theorem 3.1. Thus, the majorization approach not only proves the result of Rosenberg and Jakobsson (2008), it finds that an analogous result holds for α-homozygosity for any α > 1. Figure 1 illustrates the effect of α on the constraints placed by M on αhomozygosity. Increasing α decreases both the lower bound on J α (Fig. 1a) and the upper bound (Fig. 1b), shrinking the range of values that J α can possess (Fig. 2).
We can also see that as α grows large, Hence, for large α, the ratio of α-homozygosity to the αth power of the frequency of the most frequent allele is approximately the number of alleles that attain the maximal frequency. This result has the consequence that for large powers of α, the quotient of J α and M α is an approximate indicator of the number of allelic types that achieve the highest frequency.

Specified number of distinct alleles
So far, we have treated the number of distinct allelic types as unrestricted and possibly infinite. If a finite maximal number of distinct alleles can be specified, however, then the lower bound on homozygosity for a fixed size of the frequency of the most frequent allele can be tightened (Reddy and Rosenberg 2012). We now demonstrate that majorization can recover the lower bound on homozygosity given M under the additional constraint of a fixed maximal number of distinct alleles. As was true for an unspecified number of distinct alleles, the approach gives rise to a more general result that enables bound computations for a broader class of similarity and diversity measures. We extend Theorem 3.2 to produce a result about general convex functions, Theorem 3.9, for the case of a fixed maximal number of distinct alleles. This theorem produces an α-homozygosity generalization of the result of Reddy and Rosenberg (2012) on the lower bound on homozygosity at fixed M. It furthermore produces bounds on the Shannon-Weaver index and Rényi entropy for a fixed M and a fixed maximal number of distinct alleles.
The following result was obtained by Reddy and Rosenberg (2012). To obtain Theorem 3.8, we prove a more general theorem, Theorem 3.9, which can be viewed as an extension of Theorem 3.2. has I − 1 terms (1 − M)/(I − 1). Observe that t and u both lie in D. We must show that for any vector x ∈ D, F(x) never exceeds F(t) and never takes a value smaller than F(u).
Because f is convex, by Corollary 2.4, F is Schur-convex. By Karamata's inequality (Theorem 2.5), if we can show that t majorizes every other vector x in D and u is majorized by every other vector x in D, then the inequalities in Eq. 3.10 follow as a consequence.
We first prove that t x for any x ∈ D. Observe that for any x ∈ D, x 1 , x 2 , . . . , x I M, and moreover, I i=1 x i = 1. Hence, for each i from 1 i M −1 , the sum of the first i terms of x satisfies x 1 + · · · + x i i M, where the right-hand side of the inequality gives the sum of the i largest components of t. For M −1 + 1 i I , observe that x 1 + · · · + x i 1, the right-hand side of which is again the sum of the i largest components of t. Because the partial sums of t are at least as large as the partial sums of x for all i, t x, as claimed.
Next, we prove that u ≺ x for any x ∈ D. Observe that for any x, because its components are arranged in non-increasing order, for i 2, the sum of the i largest components is always at least M + (i − 1)(1 − M)/(I − 1), which corresponds to the sum of the i largest components of u. Indeed, if it were not so, then for some j, the sum of the j largest components of x would be less than M +( j −1)(1 − M)/(I − 1). Because the sum of all I components of x is 1, the remaining I − j components would then have sum larger than (I − j)(1 − M)/(I − 1). This would in turn have as a consequence that at least one of the j largest components of x exceeds (1 − M)/(I − 1), and one of the I − j smallest components exceeds (1 − M)/(I − 1), contradicting the non-increasing order of the entries of x. We conclude u ≺ x, as claimed.
If f is concave, then F is Schur-concave (Corollary 2.4), so the arguments above imply that the inequalities in Eq. 3.10 are reversed. For the equality conditions, note that by the equality condition in Karamata's inequality (Theorem 2.5), F(x) = F(t) or F(x) = F(u) for strictly convex or strictly concave f requires that x be a permutation of t or u. It then follows from the nonincreasing order of entries in x that x = t or x = u.
Proof of Theorem 3.8 We set f (x) = x 2 in Theorem 3.9. Observe that for the lower bound expression, Because f (x) = x 2 is strictly convex, the equality condition in Theorem 3.9 confirms that equality with the lower bound is obtained if and only if p 2 = · · · = p I = (1 − M)/(I − 1). The upper bound expressions for F are identical for both Theorems 3.2 and 3.9. The upper bound in Theorem 3.8 and its equality condition both follow from the identity of the upper bounds in Theorems 3.2 and 3.9.
Recalling that f (x) = x α is strictly convex for α > 1 and making the substitution f (x) = x α , we obtain a more general result for α-homozygosities.  We can quantify the additional constraint imposed by fixing the maximal number of distinct alleles by noting that the area of the region of permissible values shrinks from the original value of S α = [1/(α + 1)] ∞ t=1 1/[t (t + 1) α ] (Eq. 3.6). First, the constraint on I forces M 1/I . We denote by L I α and U I α the areas of the regions of the unit square bounded above by the lower and upper bounds, respectively, and we compute these areas in the "Appendix". Thus, denoting the area of the region of permissible values by S I α = U I α − L I α , we have

Equality in the upper bound occurs under the same conditions as in Corollary 3.5. Equality in the lower bound occurs if and only if p
(3.14) This quantity is less than S α from Eq. 3.6, which both has more positive terms in its summation and does not subtract from the sum the positive quantity [(1/(α + 1)](I − 1)/I α .

Fig. 4 S I
α , the area of the region of permissible values for α-homozygosity, as α increases from 1.01 to 10 and the maximal number of distinct alleles I increases from 3 to 15. S I α is calculated using Eq. 3.14. As I increases, the curve approaches the shape of the S α graph in If the number of allelic types is finite, however, then we can apply Theorem 3.9 to obtain bounds on the statistic if the frequency M of the most frequent allele is specified. We observed earlier that the Shannon-Weaver index is strictly Schur-concave, and is thus antitone with respect to majorization. In other words, the more majorized a vector is, the smaller the value of H it outputs. Using this fact, we have the following bounds on the Shannon-Weaver index for a fixed frequency of the most frequent allele.

Equality in the upper bound occurs if and only if p i = (1 − M)/(I − 1) for 2 i I . Equality in the lower bound occurs if and only if p i = M for
Here, we adopt the convention that 0 log ∞ = −0 log 0 = 0, so that if M is the reciprocal of a positive integer, the lower bound reduces to log(1/M).
Proof We apply Theorem 3.9 to the continuous and concave function f (x) = −x log x = x log(1/x), which is both non-negative on [0, 1] and satisfies f (0) = 0. Reversing the inequalities in Theorem 3.9 owing to the concavity of f , for the lower bound, recalling that For the upper bound, We can also obtain a more general result for the Rényi entropy H α for α ∈ (0, 1) ∪ (1, ∞). Recall the set D and vectors t and u from the proof of Theorem 3.9, in which it was demonstrated that t x u for any x ∈ D.
The Rényi entropies, though not generally concave for α > 1, are strictly Schurconcave for all α > 0 as a consequence of possessing the weaker property of quasiconcavity (Ho and Verdú 2015). Symmetric, quasiconcave functions are Schurconcave (Marshall et al. 2010, p. 98), and the Schur-Ostrowski criterion (Theorem 2.3) verifies that the Schur-concavity is strict: It follows from the definition of Schur-concavity that the strictly Schur-concave H α satisfies H α (t) H α (x) H α (u), with equality at the lower and upper bounds if and only if x = t and x = u, respectively. The Rényi entropy H α then has for its lower and upper bounds the quantities H α (t) and H α (u), respectively: (3.18) Considering the Shannon-Weaver index to be the α = 1 case of the Rényi entropy, we can combine these quantities with Corollary 3.16 to state the following corollary.
Corollary 3.19 (Bounds for Rényi entropy) Suppose p ∈ I −1 , where I 2 is fixed with M ∈ [1/I, 1], p 1 = M is fixed, and p i p j whenever i < j. For α > 0,  Following the approach we adopted for α-homozygosity, we can quantify the con- log I ] that are bounded above by the lower and upper bounds of H given M, respectively. We compute these quantities in the "Appendix", and obtain that the area of the permissible region for H , denoted by S I SW = U I SW − L I SW , satisfies (3.20) The sum in Eq. 3.20 converges as I → ∞: because log t < √ t for t 1, log(t + 1)/[t (t + 1)] < 1/(t √ t + 1) < t −3/2 , so that the sum is bounded above by the convergent sum ∞ t=1 t −3/2 . Dividing S I SW by the total area of rectangle R, or (1 − 1/I ) log I , we see that as I → ∞, Hence, with a large number of allelic types, the permissible values of H span about half the area of the rectangle R in which (M, H ) must lie. Because a positive term is subtracted from 1 2 in the ratio S I SW /[ 1 − 1 I log I ], S I SW is strictly smaller than (1/2)(1 − 1/I ) log I . Figure 6 plots S I SW as a function of I . With the graph of (1/2)(1 − 1/I ) log I shown for comparison, we observe that S I SW is bounded above by (1/2)(1 − 1/I ) log I , and hence also by (1/2) log I , in accord with our analytical observation.

Application to data
To illustrate the bounds on α-homozygosity and the Shannon-Weaver index, we plot data on allele frequencies from a human population-genetic data set alongside the bounds in Corollaries 3.5 and 3.16. We use 783 multiallelic microsatellite loci studied in 1048 individuals drawn from populations worldwide (Rosenberg et al. 2005). For each locus, we treat sample frequencies as parametric allele frequencies, and we obtain J α , H , and M in the full collection of individuals. The missing data rate was 3.7% (Rosenberg et al. 2005), and the minimum nonzero frequency was 1/2094, observed at a locus with one missing individual. Figure 7 plots α-homozygosity for each of several choices of α, considering all 783 loci. For smaller α, α-homozygosities tend to lie nearer to the upper bound in the permissible range. In fact, for α = 1.01, α-homozygosity is close to the theoretical maximum. Because low values of α give substantial weight to subsequent alleles after the most frequent one, the fact that such alleles tend to have nontrivial frequencies (at least 1/2094) causes α-homozygosity at low α to lie near the upper bound. As α increases, the α-homozygosities shift away from the upper bound. For α = 5 and α = 10, the tight constraints placed by M on J α reduce the range of permissible values, and data points are close to the lower bound of a tight range.
In Fig. 7D, the α-homozygosity values of three loci stand out for lying close to their corresponding theoretical maxima. The allele frequencies associated with these loci, arranged in decreasing order, appear in Table 1. In each case, α-homozygosity lies close to the maximum because for a value of M > 1/2, the allele frequency vector approximates the scenario with frequencies M and 1 − M that produces the maximal J α .
Interestingly, one of the three loci, TGA012P, had previously been chosen as a particularly clear example of a loss of alleles that occurred during ancient bottlenecks that accompanied human migrations outward from Africa (Figure 2 of Rosenberg and Kang 2015). All six rare alleles at the locus occur in Africa, three of them exclusively so; in Native Americans, distant from the initial source of human genetic diversity in Africa, only the most frequent allele is present. The pattern accords with a scenario in which human migrants out of Africa possessed only a subset of the available alleles, only the most frequent of which was present in later migrants into the Americas. Indeed, because rare alleles are often exclusive to Africa, an allele frequency vector with many rare alleles and with second highest frequency near 1 − M is a potential candidate for clearly illustrating the loss of alleles during human migrations.   Recalling that for large α, J α /M α approximates the number of distinct alleles with frequency M, we can identify from among the larger values of J 10 /M 10 the loci with multiple alleles of comparable frequency to M. For example, the leftmost locus, with J 10 /M 10 ≈ 3, corresponds to locus GGAT2C03, whose three most frequent alleles have similar frequencies, 0.1136, 0.1091, and 0.1067. Locus GATA88F03P has 0.2404 and 0.2399 for its two highest frequencies, with J 10 /M 10 ≈ 2.0108, and locus D10S1423 has p 1 = 0.3069 and p 2 = 0.3064, with J 10 /M 10 ≈ 1.9968. For most loci, however, the nearest integer to J 10 /M 10 is 1, indicating that one allele is substantially more frequent than the others.
In Fig. 9, we plot the Shannon-Weaver indices of the loci alongside the lower bound and upper bounds for different choices of the number of distinct alleles I , in accord with Corollary 3.16. The upper bound lines classify the Shannon-Weaver indices into multiple regions, with a data point lying above an upper bound line only if the number of nonzero allele frequencies at the locus exceeds the number of distinct alleles I associated with the line. We find in Fig. 9 that the Shannon-Weaver indices for the 783 loci mostly lie well below the theoretical maxima. The highest Shannon-Weaver index observed is H ≈ 2.6521, at the locus D22S683, for which M ≈ 0.1867 and I = 32; this value is lower than the theoretical upper bound of 3.2743 computed at these values of M and I from Corollary 3.16.

Discussion
Using majorization, we have obtained bounds on homozygosity, α-homozygosity, the Shannon-Weaver index, and the Rényi entropy in relation to the frequency M of the most frequent allele. For homozygosity, majorization recovers the bounds obtained by Rosenberg and Jakobsson (2008) in the case of an unspecified number of distinct alleles (Theorem 3.1) and by Reddy and Rosenberg (2012) for a fixed maximal number of distinct alleles I (Theorem 3.8). Moreover, in the fixed-I case, α-homozygosity for arbitrary α > 1, the Shannon-Weaver index, and the Rényi entropy for α > 0 achieve their extrema at the same allele frequency vectors that produce the minimal and maximal homozygosity (Corollaries 3.13, 3.16, 3.19).
The results not only simplify the derivation of bounds on homozygosity given M, they also illustrate that α-homozygosity for α > 1 behaves similarly to the standard 2-homozygosity in its dependence on M, with an increasing influence for M as α increases. Owing in part to the diploidy of many species of interest, in which individuals possess two alleles at a locus, 2-homozygosity-the α = 2 case of αhomozygosity-has been a natural statistic for use in measuring genetic variation. Homozygosity represents both the probability that two alleles drawn at random from a population are identical and the probability that the two alleles of a diploid individual have identical types. In α-ploids for α > 2, the analogous probability that all α allelic copies in an individual at a locus are identical is α-homozygosity. As ploidy increases past 2, the probability that all α alleles are identical is more strongly influenced by M than it is for diploids. In the extreme case of large α, we found that the ratio J α /M α approximates the number of distinct alleles whose frequencies are near M (Eq. 3.7).
The varying emphasis on M of our α-homozygosity statistics is of interest in settings in which 2-homozygosity is currently used. Rosenberg and Jakobsson (2008) commented that tests that identify alleles that positive natural selection has driven rapidly to a high frequency by searching for regions with high haplotype homozygosity use homozygosity as a way of detecting scenarios with a high value of the frequency of the most frequent haplotype. Garud and Rosenberg (2015) developed homozygosity-based tests that search for soft selective sweeps-in which positive selection has inflated the frequencies of multiple haplotypes rather than a single haplotype-by focusing on haplotypes other than the most frequent one. In both cases, α-homozygosity for different α could potentially be used: small α < 2 in the latter soft-sweep case placing more emphasis on subsequent haplotypes, and large α > 2 in the former "hard-sweep" case focusing on the highest-frequency haplotype.
Our results on the Shannon-Weaver index can enable further insight into the statistic in population-genetic settings. Although this statistic has been used less often than homozygosity, it is of interest both for its historical use (e.g. Lewontin 1972), for possible comparisons across data types to areas where it appears more frequently, as well as for use in such settings in their own right. In the ecological context, where H measures the diversity of a distribution whose frequencies correspond to species abundances rather than allele frequencies, if the number of species is fixed at I , then H is bounded above by log I (Legendre and Legendre 1998, pp. 239-245). In Corollary 3.16, however, we have shown that if the largest frequency in the distribution is also specified, then a further constraint is placed on the theoretical maximum of H . The upper bound cannot exceed log I , but it is in fact less than log I except in the case that M = 1/I and all I species have equal frequency. Furthermore, as the number of "species" I increases without bound, only at most half the area of the rectangle R I = [1/I, 1]×[0, log I ] enclosing potential locations of the pair of quantities (M, H ) contains permissible pairs of values for M and H (Eq. 3.21). Thus, our analysis finds that averaging over permissible values for M, H is substantially more constrained on average than might be surmised from knowledge that its maximal value across all M is log I .
We have found in our data analysis that values near the bounds are obtained by empirical allele frequencies; the loci generate values of α-homozygosity and the Shannon-Weaver index that cover much of the permissible range for these quantities, especially for the more intermediate values of α. The bounds assist in the interpretation of the distributions across loci of (M, J α ) and (M, H ), describing their placement within the permissible range; the analysis shows that the bounds are useful for clarifying constraints on data points. Outliers in the plots uncover loci of potential interest, with the ratio J α /M α identifying loci with similar frequencies for the two or three most frequent alleles, and the proximity to the upper bound of J α uncovering an illustration of a serial loss of alleles during human migrations.
Our contributions have utilized majorization and the Schur-convexity and Schurconcavity of the functions used in calculating statistics in population genetics. Each statistic we studied captures the intuitive property that for a fixed sum of frequencies and a fixed maximal number of nonzero components, the least majorized vector, containing only a single nonzero entry, has the lowest value of a diversity index and the highest value of a similarity index, and the most majorized vector, containing a maximal number of equal entries, has the lowest value of a similarity index and the highest value of a diversity index. Indeed, a much stronger result holds, in that these statistics or their additive inverses preserve the majorization order of input vectors, and for fixed I , their extrema are achieved at the same frequency vectors.
The connection of majorization to the measures suggests that one aspect of assessing if a new proposed diversity or similarity measure is sensible is an evaluation of whether it or its additive inverse preserves majorization. The homozygosity and Shannon-Weaver statistics have this property, as do the α-homozygosities and Rényi entropies. Among statistics that are isotone or antitone with respect to majorization, it could then be evaluated if some are preferable for a particular purpose, as noted above for the potential role of different α for detection of different forms of positive selection. Note that the Rényi entropies for α ∈ (0, 1)∪(1, ∞) do not have the simpler form I i=1 f ( p i ) possessed by α-homozygosity and the Shannon-Weaver index; our analysis illustrates that the majorization method applies not only to statistics that take on a form I i=1 f ( p i ) that sums analogous quantities computed separately for distinct alleles, but also for a broader class of multivariate Schur-concave functions.
This study both contributes new results and brings methods from another context to the study of well-known population-genetic statistics. We have highlighted that the strong dependence of homozygosity on M identified by Rosenberg and Jakobsson (2008) and Reddy and Rosenberg (2012) can be either magnified by considering α-homozygosity for α > 2 or lessened by using 1 < α < 2, and that the Shannon-Weaver index is constrained in an interval of size well below log I if M is specified. We have also observed examples of these theoretical results in computations with data from human populations. We suggest that the fact that the majorization approach obtains new results on statistics as fundamental as variant forms of homozygosity and the Shannon-Weaver index hints that it might have considerable potential to contribute to mathematical bounds on additional population-genetic statistics.