Maximum Entropy Estimation Method

Ryu, Hang K.; Slottje, Daniel J.

doi:10.1007/978-3-642-58896-9_2

Hang K. Ryu⁵ &
Daniel J. Slottje⁶

Part of the book series: Lecture Notes in Economics and Mathematical Systems ((LNE,volume 459))

80 Accesses

Abstract

Before using ONB and maximum entropy to analyze changes in inequality, this chapter develops the formal theory necessary to do so. That is, in this chapter, we develop mathematical and statistical properties of the maximum entropy (ME) method and then relate it to other well known flexible functional form approaches. First we shall explain what we mean by the ME principle and then review the Jaynes’ (1979) concentration theorem to provide some justification of the ME method as a density estimation method. Since there has been little previous research on applying the ME principle to derive economic relationships, ² we shall begin with the physicists’ view of this principle. The ME principle means that the entropy of the physical universe increases constantly because there is a continuous and irrevocable degradation of order into chaos. As a simple example, we can consider a closed system filled with a large number of interacting particles and leave the system to interact freely for a long time. Then the system will reach a maximum entropy state. Statistical physicists find the ME density function for this equilibrium system which is described by a constant average energy per particle. See, for example, the Maxwell-Bolzmann distribution in Rao (1973).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Notes

This chapter follows Ryu (1990).
Google Scholar
An exception is the work done by Georgenscu-Roegen(1971).
Google Scholar
Suppose we interpret our problem as a multinomial distribution with cell probabilities π₁, ⋯, π_n. In T independent trials, we have a sequence of numbers T₁, ⋯, T_n corresponding to n outcomes and we can define frequencies f₁ = T₁/T, ⋯, f_n= T_n/T. Then the likelihood function of the multinomial distribution is Therefore in (2.3), we defined W as the allowable number of permutations for any given sequence of numbers T₁, ⋯, T_n. ⁴ Mood, Graybill, and Boes (1974) show the Stirling formula in their Appendix A as T ! = (2π)^1/2exp(-T)T^T^+0.5exp[r(T)/12T] where 1-1/(12T+1)<r(T)<1. Hence, log(T !) =-T + (T + 0.5) log T = T log T-T as T → ∞ where = means approximately equal to.
Google Scholar
See Mead and Papanicolaou (1984) for the existence conditions.
Google Scholar
We can establish accuracy up to the certain decimal point or up to the certain number of digits.
Google Scholar
We can divide the unique convergence problem and an uniqueness problem. In my experience with this algorithm, convergence has not been a problem that we shall emphasize uniqueness problem. Rewrite second round iteration B⁽²⁾ c⁽²⁾ = d⁽¹⁾ as c⁽²⁾ = [B⁽²⁾]^-1d⁽¹⁾. If Λ is an NxN diagonal matrix with n^th element 1/n, then Λ[c⁽²⁾-c^(l)] = Λ[B⁽²⁾]^-1d⁽¹⁾ c⁽¹⁾. If we define e⁽²⁾ ≡ B⁽²⁾Λ[[B(2)]^-1d⁽¹⁾-c⁽¹⁾], then [B⁽²⁾] Λ[C⁽²⁾-C⁽¹⁾] ≡ e⁽²⁾. If we set, then we have a relationship Iteration method based on (1*) is equivalent to the iteration method based on (2.18)–(2.20). Therefore, we shall prove unique convergence of the iteration method of (1*). Since we know B⁽²⁾ (as well as B⁽³⁾, B⁽⁴⁾, ⋯) is a positive definite matrix, unique convergence of this iteration method can be established if we appeal to Gale and Nikaido theorem (1965). Let us elaborate this in the following. Suppose we have a differentiable mapping g: S → R^N where S is a region in R^N, and g(s)= (g_m(s)) (s∈S, m = 1, ⋯ N), gm(s) being differentiable functions on S with total differenctials. Suppose we choose x ∈ [0, 1] and define, then where we used Therefore. Since we know B⁽²⁾ (as well as B⁽³⁾, B⁽⁴⁾, ⋯) is a positive definite matrix, unique convergence of this iteration method can be established for s if we appeal to Gale and Nikaido theorem (1965). If x ∈(-∞, + ∞), we can derive a similar expression.
Google Scholar
Let y ≡ 1/a and dy =-da/a². Then we have where we have used definition of the gamma function
Google Scholar
If z ≡ (S + x₀)/a and dz = (S +x₀)(-da / a²), then
Google Scholar
To show (2.64), we shall derive several useful relationships, (i) If we apply a formula from Zeller (l971) p.372, we can show (T /2)^0.5Γ(v + l/2)/Γ(v / 2) → 1 as T → ∞. (ii). (iii) If A ≡ (μ-x₀)² / vs², then exp[log(l + A)^{-T / 2}] = exp[-(T / 2) · A] = exp[-(μ-x₀)² / vs²]. Using these relationships, we can derive (2.64).
Google Scholar
Let us prove (2.75). Suppose we normalize the density function which is given in (2.21). From Therefore, we have proved (2.75).
Google Scholar
We have requested y(x) be positive for all x. However, this requirement can be relaxed easily. If we can assume — A < y(x) for any big positive finite constant A, then we can always transform y(x) so that y(x) = y(x) + A > 0. Therefore, the only restricition which violates our assumption is when y(x) becomes negative infinite, then linear transformation of y(x) can not make y(x) to be positive. As a simple example of this exceptional case, suppose we have a bivariate joint normal distribution f(x, y), then we know the regression function is y(x) = α + αx. Therefore, when x approaches negative infinity, y(x) approaches negative infinity if β > 0. In this case, our required assumption is violated, and we have to estimate a bivariate joint pdf to find a regression function from it.
Google Scholar
Gallan’t Fourier flexible form includes quadratic trend term in the expansion. If we impose ∫ xⁿ f (x) dx = v_n for n = 1, 2 and ∫ exp[inx] f (x) dx = ξ for n = 0, ± 1, ⋯, ± N, then the ME method will produce both the trend term and Fourier series terms.
Google Scholar
Jeffereys (1967) shows an example of the simplicity postulate. A physicist would test first whether the whole variation is random as against the existence of a linear trend; then a linear law against a quadratic one, then proceeding in order of increasing complexity. All we have to say that the simpler laws have the greater prior probabilities. This is what Wrinch and Jefferys called the simplicity postulate.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Economics, Chung Ang University, Seoul, Korea
Prof. Hang K. Ryu
Department of Economics, Southern Methodist University, Dallas, TX, 75272, USA
Prof. Daniel J. Slottje

Authors

Prof. Hang K. Ryu
View author publications
You can also search for this author in PubMed Google Scholar
Prof. Daniel J. Slottje
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ryu, H.K., Slottje, D.J. (1998). Maximum Entropy Estimation Method. In: Measuring Trends in U.S. Income Inequality. Lecture Notes in Economics and Mathematical Systems, vol 459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-58896-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-58896-9_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64229-9
Online ISBN: 978-3-642-58896-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics